What does the lab data say about the various causes of high error rates?

  • 6
  • Article
  • Updated 5 years ago
For quite some time now, the Das Lab has been working to improve the higher-than-desired error rates associated with the massively parallel Eterna lab protocol. Some of those causes, just as a batch of reagents going bad, or a specific DNA template strategy that proved to adversely affect RNA amplification, have been clearly identified. In other cases, there are causes which the Lab thinks are relevant (high GC content, large variability in sequence length, ...) but are less well quantified. The newly announced plan to try making all the sequences in a round the same length is an attempt to see how big an improvement that makes.

What strikes me is that as Eterna players, we have generated a tremendous amount of data about this, since every individual RNA molecule analyzed in the Cloud lab comes with an estimate of the error associated with the SHAPE values for that molecule. This estimate is called the signal-to-noise ratio, often written as S/N ratio. These have been recorded in the RDB files archived in Stanford's RNA Mapping Database, but have not been easily accessible.

But now, Meechl has created a folder of Google spreadsheets containing most, if not all, of the Eterna synthesis data produced by the Cloud Lab. There is data here on many more syntheses than we can see from the game API. For instance all recent labs have synthesized each solution twice, with two different barcodes, but only one set has been reported in the game. And just as importantly, it is now available in a familiar format,so that any player who can use a spreadsheet to sort, filter and summarize data has full access. Meechl deserves a huge round of applause for this! In addition Meechl and Eli have created a table in the wiki that gives the round number (and hence the spreadsheet) for every cloud lab, starting with the player projects that were used as pilot experiments.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes

Posted 5 years ago

  • 6
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
There's another on-line resource available for players wanting to understand the lab data in more detail. For all recent labs, Ann has been posting a file of the specific DNA sequences she has ordered for each lab. This file is particularly interesting when there is a marked difference between the two sets of data in an RDAT file. This is the case for the most recent round (87). Ann hasn't added the lab order for round 87 yet, but I will ask her to do that.

If there is interest, I'll post more details about interpreting the order files.
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
Kudos to Omei & Meechl & Eli for taking the initiative to do this analysis and present data. The lab team would welcome analysis on what padding or barcode strategies work best for overall signal-to-noise ...
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Thank you, Rhiju.

Could you elaborate on the hypothesis that led you to try constraining the next synthesis round to a single length? And what analysis of the existing data would you find most useful for confirming or falsifying that hypothesis?
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
@brourd sent me a note that some of the longer sequences (which he was using to test a hypothesis about the mysterious strucutre of poly(A)). He saw low signal-to-noise in those cases.

The hypothesis here is that longer sequences are less efficicently amplified when we do the polymerase chain reaction to amplify the libraries we get from CustomArray. So there are fewer of those RNAs in teh final pool, and weaker statistics.

If you can just plot mean S/N vs. length for the different labs so far, that would help us test the hypothesis!
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
I haven't gathered enough data to really speak to the length issue yet, but I did notice something worth noting about round 87.

Round 87 has been synthesized twice now, with each synthesis including two "variant" conditions. The first variant, which I have started calling the "unmarked variant" uses 3' padding of (apparently) random bases to bring all the DNA templates to the same length. "Variant 0" (identified by the "-0" suffix on the Sequence_ID field) also used 3' padding, but with all A's.

Both rounds showed that the "random" padding produced much better S/N ratios that the "poly-A" padding. In the R87_0000 file, the average S/N ratio is 1.42 for the unmarked version and 0.59 for version 0. In the R87_0001 file, the respective figures are 2.41 and 1.06. So the "random" padding strategy was clearly the better one as for as raising the S/N ratios.

Rhiju, the second synthesis was also significantly better than the first, with the average S/N ratio for the unmarked version going from the lowest reported in recent rounds to something near average for recent rounds. What did you do change for the second synthesis?
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
BTW, I have started recording some summary statistics about all the labs in a Google spreadsheet. If you want to suggest other stats, help gather them, or are just curious, feel free to look.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Here's a chart of the average S/N ratio vs RN length of the combined R87_0000 and R87_0001 files.
(The trend curve is a negative exponential.)

It certainly supports the notion that within a given round, longer RNAs tend to have lower S/N ratios than shorter ones. But by itself, it doesn't directly address the question of whether variation of length in a round has a significant effect on S/N ratio.
Photo of Meechl

Meechl

  • 81 Posts
  • 27 Reply Likes
I think it will be interesting to compare two rounds that have no varying lengths in within themselves, but one round has designs with greater lengths than the other. I believe the early rounds (around R70) didn't vary lengths, but were shorter than the round that's currently in the design phase. If there haven't been any big changes in the experimental procedure, comparing those rounds could help us figure out the problem.
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
Agreed -- possible to make a plot of mean length and also S/N 'across rounds'?
Photo of Ann Kladwang

Ann Kladwang, Researcher

  • 6 Posts
  • 2 Reply Likes
Date: 2 Sept 2014

Dear Omei & Meechl & Dr. Rhiju;

I did check my notebook on R87 , here I would like to confirm you both about detail of R87.

a) Both runs of R 87, came from the same prep. RNA, CDNA, ligated.cDNA.

b) Run I & II performed at the same Miseq machine on 18 & 21 Aug 2014 (BC-local at 2nd floor)

c) Run I, cDNA eluted by NaOH

while run II, cDNA eluted by formamide.

This one might be a clue what Omei found " In the R87_0000 file, the average S/N ratio is 1.42 for the unmarked version and 0.59 for version 0. In the R87_0001 file, the respective figures are 2.41 and 1.06. "

Even the way to use NaOH extarction will give me more cluster at Miseq run but the signal of data lower than extracted by formamide (1.001 Vs 1.11) . The quality of data also confirmed by Omie analysis.

So from now on I will stick with extraction cDNA by using formamide.

Dear Omie &MeechI, your time and your observation is really useful and really help me to get a better result in the future, thank you very much for your analysis.

I really appreciate for you help.

Yours,
Ann
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Thank you, Ann!

Was this the first time you tried using an alternative to formamide? I ask because we're planning on looking at all the results back to the first Cloud Lab. Anything you can tell us about variations in the process will help us interpret the results.
Photo of Ann Kladwang

Ann Kladwang, Researcher

  • 6 Posts
  • 2 Reply Likes
Formamide method started at R79 (CL-11), before that I used NaOH.

The reason I tried NaOH again in R88 because MOHCA experiment that I tried NaOH and I got more cluster formed by NaOH. So we would like to compare side by side.

Yours,
Ann
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Thank you again, Ann.

Random padding substantially better than poly-A in Round 87
So now I'm puzzled by the large differences between the random padding results and the poly-A padding results in Round 87, when the difference between the same comparisons was minimal in Rounds 84-86.

Here's a screenshot of my S/N ratio comparison doc.


In Round 87, both of the runs had a random padding S/N ratio well over twice as high as the poly-A. But among the three round before that, no run had more than a 15% improvement with random padding. I've collected some other statistics about these runs, but nothing jumps out as being strongly predictive of the difference.

Does anyone have any theories?
Photo of Meechl

Meechl

  • 81 Posts
  • 27 Reply Likes
Okay, here are some old and new graphs. I attempted to show them in the thread instead of linking:

S/N vs Round (Zoomed In)
plot

S/N vs Round BoxPlot (Zoomed In)
plot

Average S/N vs Round
plot

Average S/N and Length vs Round
plot

Average S/N vs Length
plot

Average S/N and Number of Designs vs Round
plot

Average S/N vs Number of Designs in Round
plot

Also, with round 88, the Eterna History Tour labs were the only ones whose results showed up in RMDB. Were the Mimic sequences synthesized?
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Meechl, this is simply beautiful!

It really do look like number of designs per round have an effect on S/N average. When number of designs are high, S/N average is weak. So it has an effect, like Rhiju mentioned.



Similar, like Brourd mentioned, length of designs have a effect on S/N ratio. When length is long, S/N ratio is weak.

Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Meechl, this is fabulous! Getting all the data into one spreadsheet is going to blow up the possibilities.
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
wow!
Photo of Meechl

Meechl

  • 81 Posts
  • 27 Reply Likes
I forgot to add Average Signal to Noise vs Average Length plot. I included the round numbers next to the points to show which rounds don't fit with the trend of longer sequences causing lower signal to noise.

plot
Photo of Hyphema

Hyphema

  • 91 Posts
  • 25 Reply Likes
Just thinking from outside the box and not sure this could play a roll with the S/N Ratio so ignore me if I am being ludicrous. But, could it be what the labs were in round 87 that made using a random padding much better than a poly A padding?
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Thinking outside the box is good, Hyphema!

Do you see anything about the round 87 structures or sequences that distinguishes them from the previous rounds? I can' think of a mechanism that would link the DNA padding choice with the specific labs in round 87. But if you see some pattern in the data that suggests it might be happening, then we should investigate it.
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
Just made a blog post in the game to see if other players might be interested in following the discussion.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Good idea, Rhiju.

For some reason your post is getting buried behind the older one about ushering in the 3D era, so it's not going to get noticed by many people. I'll send a PM to jnicol about it.
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
I think I still don't fully understand the connection between high error rates and the signal-to-noise ratio.

Eg. I'm wondering about my lab Stabilise the 1-1-1-1-1 multiloop. (http://eterna.cmu.edu/web/browse/3644...) A lot of the designs have a rather high error rate. However when I look at the Signal-to-noise ratio for this lab, which is run twice in round 73. (https://docs.google.com/spreadsheets/...) I can't find one single design with weak Signal-to-noise ratio. All have good or medium.



In comparison my other lab Early Bulge control from round 83, has very good signal-to-noise ratio, but I can't see any error rates from within the lab interface. Early bulge control is a short lab design in a long lab batch. Where as the 1-1-1-1-1 lab is a long design in a short lab batch. The early bulge control, had a ton of winners where as 1-1-1-1-1 only had one.

The designs that typically triggered high error rates and the U shaped pattern I mentioned over in this other post where we discussed error rates also (https://getsatisfaction.com/eternagam...), usually had several longer stems.

I still think there is something to my idea with structures (and their sequences too) having an influence on signal-to-noise ratio.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Eli, I think you must have confused the lab (called project in the RDAT files) "Stabilize the 1-1-1-1-1-1 Multiloop", from round 85, with the lab " 1-1-1-1-1-1 Multiloop" from round 73. Here's the summary of the S/N ratios for the former, derived from Meechl's spreadsheets.


And I definitely agree there is something to your ideas about structure and sequences. Let's see how much of the patterns you noticed are captured by the DNA free energy predictions. Although I expect to eventually provide these values for all the synthesized sequences, you might want to go ahead and do some manually.
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Ah, now things makes a lot better sense. :) Thx for clearing my confusion, Omei. I think what you suspected was exactly what happened. So really error rates and Signal-to-noise is the same animal. Just called something different.

I'm really happy to hear that you are considering automating the DNA free energy predictions. But good idea that I should still do some manually. I will do that. I'm thinking about looking at batches that was long and had bad signal to noise and look at some of the designs with weak S/N and some with good S/N. Similar I will look at shorter batches with better signal to noise, and do the same. I'm basically interested in if there is something that make the bad batches stand out on this account. Also I am particular interested in the specific labs that had real bad error rates plus those labs with the S curve pattern in error rates.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
What changed between rounds 79 and 80?

I loaded a copy of Meechl's "All Rounds" sheet into a Google Fusion Table, where the built-in database facilities make some things easier. I'll elaborate on the use of fusion tables in a separate post or thread, but for now here's something very interesting I discovered.

Here's a plot of S/N vs Length for round 79

Note that the S/N ratios are less than 8, and there is no marked correlation between length and S/N. This is typical of the rounds up through 79.

Here's the same plot, but for round 80.

Here, the highest S/N ratio has jumped to 48, and shorter sequences are doing better than longer ones. This favoring of shorter sequences has persisted since round 80.

Ann and Rhiiju, can you go back through your notes and see what changed in round 80?

(I don't have the time at the moment to go into details of how to use the fusion table UI to look filter, sort and chart data, but the URL is https://www.google.com/fusiontables/D... for anyone that wants to try it. There is plenty of documentation on the Web.)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
I've looked into automating NUPACK to generate free energy estimates for the two single strands of the DNA templates. It's can't easily be done using the Web server, because the server embeds the value in an image. But MIT provides source code I have compiled it on my machine without any obvious problems.

I had hesitated to take this on right away because I wanted to get a better overview of all the data before getting too focused on one aspect. But thanks to all of Meechl's work, I'm ready to start digging into the DNA structure hypotheses.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Thanks, Eli. For the record, I need to say that your discovery of structure-dependent patterns in the S/N data (the https://getsatisfaction.com/eternagam... thread) got me fired up about digging deeper into the data.

This is an exciting time for Eterna!
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
@omei, i feel like we began some of this early in 2014 and made a little progress, but now i think we've reached a critical mass of data -- and player interest -- and online tools (meechl's spreadsheets! fusion!) to make rapid progress.
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
I'm happy to report that R88 will be resynthesized by the company -- its been extremely helpful to have comparisons to prior rounds to demonstrate thetruly poor quality of that synthesis.

@omei, About the R80 anomaly, Ann and I will look into it. If any of you all are going to be at our open group meeting tomorrow at 9am PST let's discuss via chat.
Photo of Ann Kladwang

Ann Kladwang, Researcher

  • 6 Posts
  • 2 Reply Likes
R79 (DasLabOrder_R79_10292013.txt) = multiA –padding at 5’end / Length 150 from order files

(start to try longer synthesis 170 residues, Custom array company only guarantee product under 100 residues)

R80 at order 111913 ( 170 residues show very low signal)

Then Jee reordered in the 150 residues in length as below, the result of S/N ratio ( average was medium ~ 1.396

R80 (DasLabOrder_R80_short_random_double_0124142014.txt) = random padding at 5’ & 3’ end / Length 150 residues

I will email order-file of R79 & 80 (150 residues) to Omie and cc Dr. Rhiju
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
One more challenge, given work in the thread above -- can we now come up with one function that can predict signal-to-noise for a given round based on properties like #designs, average length, etc.?

Can we make a rough prediction for a given design? If so, then we could try to code it up and present it to players when they submit.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
I don't think we're ready yet, but it's a good goal.

One thing we haven't finished doing is deciding what data in RMDB we should probably omit for prediction purposes. Some examples would be

* All but the last file for rounds where there are multiple RDAT files because the earlier ones were recognized as having process-related issues,

* Data for variants that are abnormal and have a plausible explanation, such as 5' padding and round 87's hypothesis of a change in chemicals interacting specifically with the poly-A padding variant.
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
@meechl -- I didn't notice issue with missing mimic sequences -- can you send a PM to jnicol asking him to check?
Photo of Meechl

Meechl

  • 81 Posts
  • 27 Reply Likes
Will do!
Photo of Brourd

Brourd

  • 446 Posts
  • 82 Reply Likes
The second round, mimic sequences should be part of the Round 89 order.
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Filling in chat relevant to the discussion here, for those who were not around:

rhiju: hey guys -- about S/N [7:03 PM]
rhiju: just talking to Ann [7:03 PM]
rhiju: we just described a plan to put up our spreadsheet of notes online [7:03 PM]
rhiju: (its on dropbox now) [7:03 PM]
rhiju: and over the next 2 weeks, i'm going to get help scanning out lab notebooks for each round [7:03 PM]
Omei: Super [7:03 PM]
rhiju: so we'll have PDFs linked in the spreadsheet to our actual notes [7:03 PM]
rhiju: so then you can help us browse this history and ask the incisive questions [7:04 PM]
rhiju: she and i will also be looking at R80, which meechl brought up as 'anomalous' later today [7:04 PM]
rhiju: hope to rely to the getsat post later today [7:04 PM]
Eli Fisker: Good to hear [7:04 PM]
jnicol: meechl asked me why the thermo mimic 2 lab was not uploaded in R88 database, but History tour was [7:05 PM]
Omei: It's not sp much that 80 was anomalous, as everthing up to 79 fits one pattern and everything after that fits another. [7:05 PM]
Omei: And the pre-80 were better in that they show such dependence on length. [7:06 PM]
Omei: didn't show [7:06 PM]
Brourd: I concur, with Omei's observation. [7:16 PM]
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Adding the continuation of the chat related to this discussion:

jnicol: Disregard my R88 question, Brourd's lab was in R89 [7:20 PM]
Brourd: Correct [7:20 PM]
jnicol: Hi Brourd! [7:20 PM]
Brourd: Hello! [7:21 PM]
Omei: Hi Brourd! [7:22 PM]
Brourd: Hello! [7:22 PM]
Omei: You haven't commented a lot on the S/N thread. Are you following it closely? [7:23 PM]
Brourd: I follow it enough. [7:23 PM]
Omei: Any particular thoughts? [7:24 PM]
Brourd: None in particular. [7:26 PM]
Brourd: Unless there is a specific aspect of the discussion that you would like to hear my thoughts on, nothing too important comes to mind. [7:31 PM]
Brourd: I can say with some certainty, though, that it seems something changed in the protocol between R79 and future rounds. What that was is beyond what I can know without the seeing their notebooks. I would probably suggest that they should order and synthesize another cDNA library for an earlier round (like R78) and see what [7:35 PM]
happens.
Omei: No, nothing specific. I just think of you as being someone who actually pays attention to the data and cares about its quality. So you might have noticed something odd that warrants special attention. [7:36 PM]
Omei: Excellent suggestion. [7:38 PM]
Brourd: As Dr. Das pointed out in an e-mail response, R87 had a rather large background number of cDNA counts on the 3' end. [7:38 PM]
Omei: You're going to post in the forum? [7:38 PM]
Omei: Hm. I don't recognize this 87 issue. Can you elaborate? [7:40 PM]
Brourd: A significant number of RNA sequences had reactivity "hotspots." 5' of the barcode, for a length of 5 to 15 or so residues. I inquired into this, and Dr. Das stated that it seems the background reactivity for that stretch was quite high, and may have not been entirely subtracted out. [7:43 PM]
Omei: I see. [7:44 PM]
Brourd: probably easiest seen in a screenshot like this. http://prntscr.com/4jyhlu [7:47 PM]
Omei: Thanks for pointing that out. It seems that hotspot patterns could be indicative of something significant in the experimental process or the data processing. It's not something I have paid much attention to. It's on my radar, now. [7:47 PM]
Brourd: These were particularly bad. SHAPE reactivities as high as 6.XXX for unpaired residues. [7:48 PM]
Omei: Definitely sounds fishy. :-) [7:49 PM]
Omei: Seems like it deserves a forum thread of its own. [7:50 PM]
Brourd: Unfortunately, the reactivity errors, when the original data set for R87 and the rerun were combined, were lowered to the point that they slid under the radar. I believe the mean error for a stretch of 5 exposed, base paired nucleotides, on a single sample was somewhere around 0.2892. [7:52 PM]
Omei: That suggests it might have been an artifact of the NaOH elute that Ann tried in R87.0000, which was judged to have been a bad idea. [7:55 PM]
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
@brourd, I like the idea for the pre-R80 data set repeat.

We still have the starting libraries from the company for those rounds; we could re-PCR them and go through the entire protocol and then do comparisons (with player's help on the analysis). If we suspect there are differences in the company libraries, we could reorder one of those libraries, as you suggest.

If you had to nominate a pre-R80 data set to redo, which would it be? You mentioned R78, but based on S/N, R77 was poorer -- perhaps would be worth collecting more data for it. What do others think?
Photo of Brourd

Brourd

  • 446 Posts
  • 82 Reply Likes
Was R77 the round that this news item was referring to?

http://eterna.cmu.edu/web/news/3475223/

That could have possibly been R76 as well, since they were all ran around the same time.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
I'll put a vote in for R79. My rationale is that if we do have difficulty identifying the relevant change between 79 and 80, this limits the factors we have to consider.

Rhiju, when you still have the starting libraries, what is in those libraries? I know that you order 6-10 DNA templates for each molecule, but I presumed that you amplified If before you split it up for the purpose of holding some out for re-testing.

Do you actually get many copies for every DNA molecule in the DAS Lab Order file?
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
Was R80 when we switched to ordering DNAs of length 150nts (or 170 nts)? Omei may have the library files from Ann and answer that question. Something catastophic may happen in the CustomArray synthesis at that point.

IF OMei is not available or does not have the DNA library file, we'll find out when all the notes & library files are online (you'll hear from Ann later today on that initiative).
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
I don't have the library files, but the RDAT data shows that Rounds 75 and before had a maximum RNA length of 110 or less, while Round 76 and beyond had a maximum length of 130.

So the last jump in length was Round 76, well before the shift in S/N ratio patterns in Round 80.
Photo of Brourd

Brourd

  • 446 Posts
  • 82 Reply Likes
Perhaps one of the older rounds, like R78, should be reordered, synthesized and sequenced again. If the data is significantly different, it could indicate that something has changed in the protocol, be it the fidelity of the DNA order, equipment issues, or something that has been overlooked.
Photo of Ann Kladwang

Ann Kladwang, Researcher

  • 6 Posts
  • 2 Reply Likes
R79 (DasLabOrder_R79_10292013.txt) = multiA –padding at 5’end / Length 150 from order files

(start to try longer synthesis 170 residues, Custom array company only guarantee product under 100 residues)

R80 at order 111913 ( 170 residues show very low signal)

Then Jee reordered in the 150 residues in length as below, the result of S/N ratio ( average was medium ~ 1.396

R80 (DasLabOrder_R80_short_random_double_0124142014.txt) = random padding at 5’ & 3’ end / Length 150 residues

I will email order-file of R79 & 80 (150 residues) to Omie and cc Dr. Rhiju
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
It looks like the only data for R79 and R80 that was entered into RMDB was for the length 150 runs.

I'm curious about what you tried with the length 170 templates. Did you add more padding to the same sequences used in the 150 templates? Or specify new, longer RNA molecules?

I'll be especially interested to see the R79 order files, now knowing that they had at least some 5' padding and yet got good S/N ratios.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Using NUPACK to estimate the free energy of the single stranded DNA that is formed during DNA amplification

Rhiju, NUPACK has 3 parameters I can set that will affect the free energy calculation -- sodium concentration, magnesium concentration and temperature. I presume the first two are questions of fact, but the third one will require some judgement on your part, since the actual temperature is continually cycling. What values do you recommend I use?
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
60 °C for temperature. For the Na(+) concentration & Mg(2+) concentration, the components of the Phusion polymerase buffer are proprietary, I think, but a rough guess might be 5 mM and 100 mM, respectively.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Rhiju, Nupack only accepts sodium concentrations between 0.05 and 1.10. Did you intend 100mM sodium and 5mM magnesium?
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
yes, sorry, 100 mM sodium and 5 mM magnesium
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Will do. But the question of what temperature to use got me thinking about the PCR thermal cycle. If our hypothesis about ssDNA folding competing with DNA synthesis is correct, then it would make sense that a change in the thermal cycle would have an effect on the end results.

Is it possible that the thermal cycle changed between R79 and R80?
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
omei, that's a very interesting idea (DNA vs. RNA 'deltaG' calcs).

I'm predicting the DNA is the issue.

On the RNA level, note that if reverse transcription fails it presumably fails after getting past the barcode -- and in that case we will still see those counts in our data as a big 'spike' in background at the problematic secondary structure. What we tend to see instead is that some constructs just don't show up at all (this is clearer to us looking at the raw data; not so clear in the final S/N estimate).

Sounds like we'll find out soon about DNA vs. RNA...
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Just a loose thought. The designs which in the recent round have had high error rates have generally had several long stems. However some of the designs that did best for these labs and rounds, had GU's added in them. So now I'm starting to wonder if the GU really helped the RNA form, because it prevented the DNA from binding with itself?
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
One of the very few to have medium signal in Eterna (good in rerun) and scored decent (93%). It was also among the few designs which had GU's

Fluffy Reversed G-C Multibranch




Here is a design that had weak signal and as most, 0 GU's. It also had weak signal in the re-run.

reversedgcmlcbp2-3


Its not because there is a huge difference between the NUPACK image or with or without GU. But perhaps the GU is enough to break DNA from forming up.

But the first of these 3 labs in the series had a ton of winners compared to the rest. The first was in round 74. The last 2 were in the round 80 and 82. And for the first one in the good batch, though GU were present among several of the winning designs, it wasn't demanded to make a winner. Which it almost was for the later two.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Eli, I really like this idea. To do a quick evaluation, I ran NUPACK a few times to see the effect of mutating AU pairs to GU pairs in RNA and DNA. It has a much larger effect on raising the DNA's free energy than it does the RNA. Hardly proof of anything, but definitely supportive of your hypothesis.
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Hehe, cool. Thx, Omei!
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
In an email conversation, Ann and I realized that the ETERNA_R87_0001 file actually contained data averaged over the NaOH and formamide experiments. She has now added ETERNA_R87_0002, which has only the formamide results. So it's the 0000 and 0002 files we should use to judge the difference between treatments, and presumably the R002 file that has the best data.

Thanks, Ann!
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Hehe, thx, Meechl. :)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
And Meechl comes to our rescue once again! :-) Thank you.

To the extent I now understand what the data files represents, these numbers don't seem too unexpected. 87.1 is based on roughly twice as much data as 87 and 87.2. With a very simple statistical model, doubling the amount of data would reduce the statistical error associated with the individual SHAPE values by a factor of 1 over the square root of two. Waving my hands in the air, I would expect this would have the effect of raising the S/N ration by a factor of 1.4, which is in the ballpark of what you found.
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Meechl, I like the exact Average S/N numbers you posted from round 87. I can see S/N average numbers for batches in the graphs you have been making, but not the precise numbers. I was wondering if you had a list somewhere. I will be happy to plot them into the Lab WIKI table.
Photo of Meechl

Meechl

  • 81 Posts
  • 27 Reply Likes
Eli, the spreadsheet with all the data has a second sheet with the average S/N for each round. Is that what you're looking for?
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Yup, I missed that. Big thx! I will fill the numbers in tomorrow.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Lab Details on Google Drive

I have created a Das Lab File folder on Google Drive and Ann and I have started work on getting the various lab files there.

The EteRNA all_info spreadsheet is the master document for all synthesis rounds. There are currently three sub-folders: Eterna DNA Templates contains the files that sends to order the DNA templates, Eterna RNA Sequences contains the lab name, design name, synthesis ID and sequence for each RNA in the synthesis, and Eterna Lab Notes contains PDF files with scans of Ann's lab notebook.

To help me (and hopefully others) understand the lab notebooks, I have started a Guide to Reading the Lab Notebooks in the Eterna Lab Notes folder and will continue to add to it as I learn. I have enabled comments for everyone, so if you have something you don't understand (either in the lab notebook itself or in my guide), please add them to the guide.
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Omei, I very much like what you have been up to. This is awesome!
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 968 Posts
  • 304 Reply Likes
Status update on using DNA free energy measurements to predict S/N ratios

I used a partially automated process to generate Nupack's estimates for the DNA equivalents of the sequences for Round 86 and put the results into the fusion table Merge of Omei's copy of R86 SHAPE and R86 DNA energies.

The general impression I have is that while the DNA free energy may be great for flagging a few designs that have very low energies, for the large majority of designs in Round 86 (say those with DNA energy greater than about -14), it doesn't show much predictive potential.