Switch Scores for EteRNA Switch Puzzles

  • 11
  • Article
  • Updated 8 months ago
An exciting direction in EteRNA is the study of riboswitches!

We have recently finished our pilot experiments with great initial success. Using a new technique that measures switching directly on a sequencing chip we directly observe the switching for thousands of designs at once. The signal is generated by a fluorescent RNA binding protein, MS2, and instead of the standard EteRNA score, which is based on the correct folding of each base, we have introduced a new Switch Score.

The Switch Score (0 - 100) has three components:
1) The Switch Subscore (0 - 40)
2) The Baseline Subscore (0 - 30)
3) The Folding Subscore (0 - 30)

The scoring scheme is summarized below. A more detailed description is given in this PDF:
https://drive.google.com/open?id=0B_N0OA9NROPGel80SG5LM0wtZms&authuser=0

A typical example of a switch puzzle is shown below:


The player designs the structures in [1*] and [2]. To observe the switching we then measure the fluorescent signal of MS2, which binds specifically to the MS2 hairpin seen in [2]. In the absence of FMN, the MS2 should bind and the switch is ON. On the other hand, if we introduce FMN, the ligand in [1*], the switch should be OFF and not exhibit fluorescence.

No switch is 100% ON or OFF in the absence or presence of ligand, but a good switch can come very close (and get a perfect EteRNA Switch Score!). A some MS2 concentration, the difference should be large (e.g., at ~100 nM MS2 in figure below). In practice, we don't know this concentration beforehand so instead we perform measurements at many concentrations to obtain binding curves. When the switch turns OFF (red curve), the effective dissociation constant increases. The dissociation constant, Kd, is the concentration where half of the RNA binds MS2.


The Switch Subscore quantifies how far apart the Kd's are in the absence and presence of FMN (horizontal distance between the red and blue curves).

The Baseline Subscore is a measure of how close the ON-state is to the the original MS2 hairpin (lower Kd is better, i.e., blue curve should be far to the left).

The Folding Subscore is high if MS2 bind properly in the ON-state at any concentration (the score should be high for the blue curve at high concentrations of MS2, i.e., high values to the right)

In our first experiments, we found that the easiest score to maximize is the Folding Subscore, followed by the Baseline Subscore. These two ensure that the MS2 hairpin is properly formed in the ON-state. The hard one is the Switch Subscore, which is the highest when the energy difference between the states is finely-tuned to the energy conferred by binding to FMN (or other future ligands).
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes

Posted 5 years ago

  • 11
Photo of jnicol

jnicol, Player developer

  • 57 Posts
  • 21 Reply Likes
Here is the more detailed link: https://s3.amazonaws.com/eterna/labs/...
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Thx Johana and Jnicol!

This is amazing and long awaited news. It was worth the wait. :)

I have done a few early thoughts on the data we got back, on what I think might work well for this kind of switch.

Thoughts about the lab results

I look forward to hear what all of you players thinks about the results from our favorite florescent puzzle. :)
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Fantastic writeup!!

This was very interesting to read and I think that you found some very fascinating trends regarding the placement of the MS2 hairpin and the distance to the complementary segments.

I'm delighted and hope that a lot of players take a look at your findings.

Great work! I believe that the same labs are now up again for another round, so hopefully your ideas will lead to even better results. In the future we should perhaps also make puzzles that vary the position on purpose to test this idea more systematically.
Photo of whbob

whbob

  • 191 Posts
  • 57 Reply Likes
In the chart of high baseline & folding sub scores, a delta of about 10 between baseline and switch scores could be a trend for higher overall scores.  
If the switch works better with the MS2  towards the middle between the aptamer, does that seem to indicate that the MS2 is happier when it has more buffer space either side of its stem?
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Nice calcs on the Stanford description by Johan.
Photo of jandersonlee

jandersonlee

  • 554 Posts
  • 129 Reply Likes
"The hard one is the Switch Subscore, which is the highest when the energy difference between the states is finely-tuned to the energy conferred by binding to FMN (or other future ligands)."

Does that mean we should be designing our switches with a greater delta FE between the two modes? I think many players target about 1/2 the 4.86 Kcal binding bonus between the two states. Should we target a higher delta instead?
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Thanks for a very good question!
The free energy difference makes a big difference and we did not provide any details about this for the first round. Some information can now be found in the PDF.

Targeting 1/2 the binding bonus is the right thinking for the traditional switching puzzles. Here, the situation is more complicated since we a using MS2 as a secondary readout. The observed switching will depend on both the free energy difference and the MS2 concentration.

In the PDF, there are a couple of plots showing the (theoretical) effects of different free energy differences (delta G's) for the OFF- and the ON-switch. For these plots, I assumed an energy bonus of 4 kcal/mol for the FMN binding, since I believe that this is closer to the concentrations used in the new RNA array experiments (200 uM FMN).

For the OFF-switch depicted in the image in the original post above, the maximum switching score can be obtained for a delta G between -2 and 0 kcal/mol (favoring the ON-state). The closer you get to 0, the more likely you are to get a maximum switch subscore. However, beyond 0 you will also likely get a lower Baseline Score. We ended up defining the scores to give you a little bit of freedom, so I would aim for ~-1 kcal/mol. This is equivalent to ~1/4 of the FMN energy bonus.

For the ON-switch, the behavior of the switch is similar. There is a sweet spot between 2 and 4 kcal/mol (favoring the OFF-state). A good idea is probably to aim for 3 kcal/mol, or 3/4 of the FMN energy bonus.

In other words, 1/2 of the FMN binding bonus is a good starting point for the energy difference, but I would recommend tweaking it:
Go lower for an OFF-switch (e.g., Exclusion 1)
Go higher for an ON-switch (e.g., Same State 1)

For the current puzzles, the tweak should be somewhere between 0 and 2 kcal/mol. This, in practice, compensates for the binding energy from MS2. We don't include this binding energy at this time for several reasons (see below) but may do so down the line.

These suggestions are, of course, based on modeling and we are curious to see how well the hold true. The next round is open so please try it out.
Photo of jnicol

jnicol, Player developer

  • 57 Posts
  • 21 Reply Likes
I think the problem is that the MS2 ligand binding actually provides more energy than we expected. Every design fluoresces with or without FMN (there are only a few exceptions that appear to be designed to inactivate the MS2 ligand). Therefore, a higher delta (with our current thinking) would only reinforce the MS2 binding which results in the fluorescence.

All designs were submitted with the MS2 sequence in the 'white' area. Most players believe that this is the best chance for success. However, if the MS2 binding energy was underestimated, then a good design should actually have shown a misfold with the MS2 sequence.

For the next round, I will add a bonus energy to the MS2 sequence, which will encourage players to disrupt the MS2 sequence while still having their design appear in the 'white'. Also, even with this change, don't be afraid to submit a design that is not all in the 'white'. Thinking differently may be the key to understanding this lab, and there are many, many, slots to experiment with!

Please add your comments on any opinions for or against this idea, so that we can understand these new fluorescent labs better!
Photo of nando

nando, Player Developer

  • 388 Posts
  • 71 Reply Likes
@jnicol: does that mean that Johan has figured out what specific bonus value should be modelled in the puzzles? the max (-11.8 I believe) or less than that?

No matter the value, only adding a molecule to the puzzles won't be enough. For one, you can only do that in the 4 'Exclusions' targets, not in the 2 'Same State' ones.

The next problem is about how these energies should relate to each other.



This is what I came up with while thinking about this topic. Unfortunately, an EteRNA puzzle where a MS2 bonus is applied will still allow for solutions that do not match the above diagram.

In the end, the only proper way to encode a puzzle including a MS2 binding bonus would be a 3 state design:
  1. MFE, whose structure we don't care about, only that it doesn't have the MS2 hairpin
  2. structure in presence of MS2 protein, should have the MS2 hairpin active
  3. structure in presence of both MS2 protein and FMN in the solution, should no longer have the MS2 hairpin


The problem with the above is that we don't have code to support the 3. item yet...
Photo of nando

nando, Player Developer

  • 388 Posts
  • 71 Reply Likes
I've just read Johan's scoring paper, and it would seem that the experimental procedure is such that MS2 protein concentrations are gently increased from ~0 to 3 µM. The measurement seems to be a "soft" one, which would imply that we dont' need to (and thus shouldn't) take MS2 bonuses into account.

Since I'm no scientist, it would be best if Johan or Rhiju could confirm this.
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
right, nando -- I don't think we need to take into account an MS2 bonus, at least with Johan's current experimental pipeline of scanning the MS2 concentration.

I'll let Johan confirm.

If we end up testing at a fixed MS2 concentration then we should almost certainly define a bonus for the MS2 hairpin and even render MS2 binding in the game; as you point out we would need to update the game to handle this. Having such fixed input and output concentrations may be important in the future, as we're going to eventually reach a limit in the number of conditions we can experimentally test and we'll need players to design for specific conditions.
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
I confirm.

The current method reports a Switch Score (based on fold-change in Kd) that does not specifically depend on the MS2 concentration.

We chose this method initially since it allows us to capture switching over a wide range of concentrations and also to characterize the MS2 protein in our setting.

A "good" switch not only exhibits a large change in signal, but also does so at a specific concentration of MS2. In the future, we would like the switches to have an ON-state close to the Kd of the MS2 hairpin. This is reflected in the Baseline Score, which only fully rewards designs with a Kd less than 2X of the MS2 hairpin (Baseline ratio10X by increasing the Kd in the ON-state 2X (based on our theoretical examples). In this case, the "optimal" MS2 concentration is ~600 nM and the MS2 energy bonus 2.2 kcal/mol.

For our current scoring scheme, maximum scores can be obtained (in theory) for fold-changes in Kd>26 (1 kcal/mol energy bonus at the optimal concentration).
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Highlight of more patterns to look out for

Since the MS2 hairpin showed a preference for where it wanted to be positioned in the sequence, I decided to check on switch labs with the FMN and TEP aptamers. For FMN aptamer sequences, there was much less of a pattern. However for the TEP aptamer, there seem to be more of a pattern.

For now it seems that very specific complementary segments to both the C and the G stretch in the MS2 hairpin, can be useful means to help make the switch happen. However the MS2 C’s segment, seems to be the one most important to make a complement for. A switch can easier be made without a strong complementary segment to the MS2 G’s. Several of the highest scoring switches had both segments.

My suggestion for this lab is to experiment with different positioning segments of G’s and C’s that are complementary to the C’s and G’s in the MS2 hairpin.

My suggestion for future labs, is to experiment with the positioning of the aptamer sequence in relation to the MS2 hairpin.

You can read more about this here:

Aptamer position in relation to MS2 hairpin

MS2 hairpin versus complementary segments
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Very interesting reading!
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Thx :)
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
All,
I combed through the results of the Same State 1 Lab from round 1 of this lab, in order to provide an overview of how to potentially improve scores. the analysis is posted here:
http://eternawiki.org/wiki/index.php5...
Any comments welcome.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Summary of findings and conclusions so far
1)
Only one design successfully reflected the actual shape required by the lab (Same State 1 by
salsish99)
2)
Only two designs (Same St
ate 1 and Same State 1NO by salsish99) had a bound molecule in the
actual lab using the actual FMN docking shape set by the design.
How did those two designs fare?
Their folding score is mediocre at best. The correct shape “Same State 1” also seems to have little fold change in K D,obs One conclusion one could draw from this is that achieving the actual desired shape does not result in a good molecule.
Or, the binding energy provided by the correct FMN
docking is insufficient to boost actual binding.
3)
Leaving a long set of A’s (I will label these as “stretches of desert”) untouched may result in
desired shapes. Such deserts simply won’t bind well (except maybe to the few U’s strewn
around). However, the overall score of all these molecules is in the lower mi
drange of all thesubmissions analyzed here.
So, leaving a maximum of 5 A’s in a row is desirable, if not even less.
4)
Maximizing the energy difference between the states seems to result in higher scores, as the
FMN seems to really bind well, even if the target shape does not coincide (even remotely) with the predetermined binding site. Ultimately, all submissions that
were created using this strategy had a vastly different shape than the one required, but all score better than average
, with a maximum of 80/100.
5)
However, high fold - change in Kd,obs alone also does not necessarily result in a good binding site.
Take, for example, 4736272_48166_2 by jnicol , which has a fairly high delta K
d,obs , but scores low (48). With an average set of values (GC 11 (55%) , AU 9 (45%), GU
0 (0%) , ΔG= - 33.5 kcal ), this
design also does not fall out of the average submission values. The T melt =107 °C is around the maximum of the submitted designs, so another rule to maximize the score seems to be to keep the melting point between 67 and 87 °C.
6)
Geometric arrangements of non
-
A’s as in Mod of Eli's Mod by
jandersonlee seems to result in very low folding
score and for some reason no differentiation between state energies. So,
another conclusion seems to be to avoid regular arrangements of blocker (i.e. non
- A) bases.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 972 Posts
  • 305 Reply Likes
This is great! I haven't had the time to study it yet, but it is really helpful that you've organized the data and images it this way.

If you're up to it, I have another suggestion/request. If the individual sub-scores were added as additional columns, it would help make the connection between the chart with the sigmoid curves and the scoring.
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Wonderful summary of Same State 1!

This is great for visualizing the puzzles. As you noticed, the predefined shape does not necessarily lead to the best switching. This is not surprising, since we don't really know beforehand what the best shape is. Your document will help in figuring out what it may be.

For the next round I therefore want to emphasize that the predefined shape is merely a starting point that includes the required elements (the MS2 hairpin and the FMN aptamer) at given locations within the sequence. Exploring different shapes and folding patterns will be very interesting.

Thanks again for the analysis!
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
you're welcome, Johan - let's hope we one day get small icons of the predicted shape in a column in the lab data outcome table :-)
Photo of Brourd

Brourd

  • 451 Posts
  • 82 Reply Likes
A few general questions to be saved for posterity here in this getsat thread.

1. What is the maximum theoretical yield of the experimental protocol, and the actual yield we can expect? In addition, what is the theoretical and actual turnaround for the experiment?

2. For this experiment, what are the expected errors involved with the measurements? What is the expected deviation in data for identical sequences between rounds?
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
See the following sheet for errors from the fits:
https://docs.google.com/spreadsheets/...
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Thanks for asking some great questions!

1. By yield, I assume that you mean the number of designs that can be tested in one round.

Theoretical yield: The current protocol is primarily limited by the number of sequences we can order in one oligopool synthesis. That number is currently 92,918 (http://www.customarrayinc.com/oligos_...).

Actual yield: Our experience so far, from both the EteRNA pilot and other experiments in our lab, points to an actual yield higher than 95%. In these experiments we duplicated the sequences on the chip, effectively reducing the number by 2X or 10X. However, we did measure 96% of 46000 sequences so we are hopeful that we will consistently get 95% yields in future EteRNA rounds, even when sequences are not duplicated.

Theoretical turnaround: 2.5 weeks (1 week synthesis, 2 days PCR for sequencing library preparation, 1 day for sequencing, 4 days of data collection, 4 days of data crunching).

Actual turnaround: Expect a month. Things break, experiments sometime fail, small steps take longer than anticipated, and even scientists need to sleep. We hope you understand :-).

2. Excellent point. The fit errors were not reported in the figures, for clarity (there was already a lot of text), but I will try to post them in a spreadsheet online.

The figures with the curves also show each data point as a light-colored dot. As you notice there is a large spread between the RNA clusters. We use the median values for fitting.

The round-to-round variation is currently unknown, but we hope to repeat the pilot round next time to quantify this. Luckily, those sequences do not need to be included in the synthesis. We are still optimizing the protocol and will, for example, increase the laser exposure next time to achieve a higher signal. Our hope is that the normalization by the internal MS2 control will take care of some of the reproducibility issues associated with this and other currently uncharacterized sources of variation.
Photo of Eli Fisker

Eli Fisker

  • 2227 Posts
  • 488 Reply Likes
Johan, I love your explanation of the known phenomenon, the difference between theoretical and actual lab data return time.

"Things break, experiments sometime fail, small steps take longer than anticipated, and even scientists need to sleep. We hope you understand :-)."
Photo of Brourd

Brourd

  • 447 Posts
  • 82 Reply Likes
I hear ya about the sleep being needed!

Thank you for the clear and concise answers, Dr. Andreasson.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Summary of findings and conclusions so far (not sure where my original post went, I can't see any comments, either, probably blocked by noscript or ABP))
1)
Only one design successfully reflected the actual shape required by the lab (Same State 1 by
salsish99)
2)
Only two designs (Same St
ate 1 and Same State 1NO by salsish99) had a bound molecule in the
actual lab using the actual FMN docking shape set by the design.
How did those two designs fare?
Their folding score is mediocre at best. The correct shape “Same State 1” also seems to have little fold change in K D,obs One conclusion one could draw from this is that achieving the actual desired shape does not result in a good molecule.
Or, the binding energy provided by the correct FMN
docking is insufficient to boost actual binding.
3)
Leaving a long set of A’s (I will label these as “stretches of desert”) untouched may result in
desired shapes. Such deserts simply won’t bind well (except maybe to the few U’s strewn
around). However, the overall score of all these molecules is in the lower mi
drange of all thesubmissions analyzed here.
So, leaving a maximum of 5 A’s in a row is desirable, if not even less.
4)
Maximizing the energy difference between the states seems to result in higher scores, as the
FMN seems to really bind well, even if the target shape does not coincide (even remotely) with the predetermined binding site. Ultimately, all submissions that
were created using this strategy had a vastly different shape than the one required, but all score better than average
, with a maximum of 80/100.
5)
However, high fold - change in Kd,obs alone also does not necessarily result in a good binding site.
Take, for example, 4736272_48166_2 by jnicol , which has a fairly high delta K
d,obs , but scores low (48). With an average set of values (GC 11 (55%) , AU 9 (45%), GU
0 (0%) , ΔG= - 33.5 kcal ), this
design also does not fall out of the average submission values. The T melt =107 °C is around the maximum of the submitted designs, so another rule to maximize the score seems to be to keep the melting point between 67 and 87 °C.
6)
Geometric arrangements of non
-
A’s as in Mod of Eli's Mod by
jandersonlee seems to result in very low folding
score and for some reason no differentiation between state energies. So,
another conclusion seems to be to avoid regular arrangements of blocker (i.e. non
- A) bases.
Photo of Eli Fisker

Eli Fisker

  • 2227 Posts
  • 488 Reply Likes
Thx for your fine analysis, Salish.

Your post have folded, so it seems gone, but when multiple comments gets added to a post, then the middle of them fold up and become invisible.

http://prntscr.com/5pxqb5

3) I can confirm your observation. I stumbled on two designs that were identical, except that one had some few non A bases in the stretches. The one with long unbroken stretches of A's scored 3 points lower than the one with some spacers.

https://docs.google.com/document/d/15...

Great point on the too regular placement of non A bases, being bad.
Photo of jandersonlee

jandersonlee

  • 551 Posts
  • 125 Reply Likes
A deeper analysis of what seemingly works to deal with the hook and/or large spans of A's in loops would be very useful!
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
According to my document the following works well. Not sure, if that's a good generalization, though:
1) G's on first and last base
The 3-5 A's following that could have any (unbinding) base, and won't influence the quality.
2) binding materials
Examples: As we do not have to keep the shape as prescribed, therese are options such as:
CCAUGUxxxxAUAUGG (or any other binding pairs) - take care that the stuff in the loop (xxxx) doesn't bind elsewhere.
These could also be interspersed with AA-AA sections to make bulges.

Note that in the design I used to 1:1 reproduce the shape of the proposed structure, I only used G's to interrupt the sequences of A, and the resulting score was low.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
What I attempt in some of my designs for round two is to repeat the FMN binding dock sequence once, twice, or thrice, or in parts, to see how this influences the score, let's see how that goes.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Ah, wanted to ask something else, johan.

You speak about having lef the experimental error out of the grpahs as there wa already so much text.
1) Focussing on the NCI graphs, they already sport this large spread in all results except 3-4 (see comments in the wiki file) - is that indicative of fluorescence? First, I thought that was the the experimental spread, and the line drawn in red and blue was the average value, but, apparantly, that wasn't the case if you included no error at all? Or did you mean that by saying "As you notice there is a large spread between the RNA clusters" in the light colored dots?
1B) If they are spreads, then why do some results have no spread at all? and is that indicative of good quality or bad quality of the RNA shape?
2) How do you claculate error, using LLS? could we overlay a 2σ confidence interval on the graphs to visually see the spread (i.e. if that already cover 0-2, the data mean will be mean-ingless (pun unintended), if it barely leaves the breadth of the linewidth chosen, it gives us a pretty good idea that the results are precise.
3) How many of each RNA are being produced (both in the traditional methods and on the fluorescing arrays)? 1 each? 1 Mol each? 1 tethered square micron each? If 1, I'm not sure that the results are representative? Or is this 1 oligopool synthesis block (92,918 times each submitted and synthesized design)?
4) How are RNA strands tethered for the fluorescence measurements to the glass? Do we use thiolipids to bond them to a thin gold surface sputtered on the glass?
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Thanks for your questions. I will try to clarify.

1) In the graphs, each faint data point is a measurement of a cluster on a sequencing chip. The large, fully colored dots represent the median value and the curves indicate the the least squares fit to the those median values. The errors I was referring to were the errors from the fit for the fit parameters, i.e., the errors for the Kd and Fmax. These in turn will give you the errors in the fold-change in Kd.

For this round we used the median values in a non-weighted nonlinear least squares fit. We have found that the median is a simple way to reduce the impact of outliers that introduce spurious signals. Technically, however, this least squares fitting assumes normally distributed values with equal variance so in the future we may instead do single cluster fits and/or bootstrapping for more representative error estimates.

1B) More data (larger number of data points) is almost always better, since it improves accuracy and evens out fluctuations. There is always some bias during PCR, cluster generation, and RNA synthesis so the number of clusters per sequence varies quite a bit. The figure below show the number of sequences (y-axis) that have a certain number of clusters (x-axis). As you notice, the average for this round was about 1100 clusters per sequence, but there are plenty of sequences with very few clusters.


Clusters with a lot of data points look like they have a large spread, since there is a higher chance of having outliers (check out the MS2 control as an example). This is mostly good, since there is a lot of data. The important value is the median. The absence of a spread suggests that there are only one or a handful of points at each concentration which is not as good. Some of the designs with bad scores may have suffered from this kind of lacking data.

Perhaps it would be useful to report the number of measured clusters for each design. In the future I will try to include that, together with the estimated fit parameter errors, in a separate file.

2) The parameter errors were calculated in the standard way for least squares fits (i.e., errors are the square roots of the diagonal elements in the variance-covariance matrix). The fits are done using deltaG as a free parameters, so there is a tranformation when converting to Kd and fold-change. Overlaying the curves could be a good idea. It may look messy but it's worth a try.

3) In the pilot round, each sequence was synthesized on 10 spots on the DNA synthesis array. Each of those spots has plenty of DNA molecules (maybe ~1 billion). These DNA molecules are then amplified by PCR and clustered on the sequencing chip. The clusters are used in the fluorescence experiments. There are 300-1000 RNA molecules per cluster, which covers an area smaller than a square micron.

4) The details are given in our original paper (http://greenleaf.stanford.edu/portfol...). The idea is illustrated below.


In brief, the glass surface in covered in a thin polyacrylamide layer with covalently attached DNA oligos. During sequencing, these oligos capture the DNA of interest and amplify them by bridge PCR, resulting in clonal clusters of double stranded DNA. RNA polymerase transcribes the RNA directly from these DNA molecules. Thanks to a biotin-streptavidin roadblock, the RNA polymerase stalls at the end of the template with the RNA still attached.

For the traditional EteRNA experiments there were millions or billions of molecules being cleaved in solutions, so it was a very different kind of experiment.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Ah, interesting. I worked on tethered ionophores in the past, thus my question.


How can you be sure the DNA stands up and is not bent so thaty the RNA would contact the subsurface again?
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Those thiolipids look cool!

The short answer to your question is that we don't know the exact orientation of the DNA. The assumption is that it is random and that it doesn't matter, but it is something we are thinking about. In particular, protein sometimes "crash out" on the surface which could limit the number of times we reuse the sequencing chip.
Photo of Meechl

Meechl

  • 81 Posts
  • 27 Reply Likes
I was unable to resist the lure of the MS2 data any longer. Naturally, I made a spreadsheet:

MS2 Spreadsheet

Unfortunately, it doesn't have all the data I'd like to add to it, such as the theoretical folding shape and the number of AU/GU/GC pairs in the second state, but one should be able to make some pretty graphs for the free energy, melting point, and such. I myself should probably be working on some other things though, so I have no graphs to share... for now. :)
Photo of Eli Fisker

Eli Fisker

  • 2227 Posts
  • 488 Reply Likes
Beautiful :)

Big thx, Meechl!
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
This is great.

We are working on more details and Nando gave me a link to a script that should return the energy of the two states. If you feel adventurous you could try it:
http://nando.eternadev.org/web/script...
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 972 Posts
  • 305 Reply Likes
Meechl and salish99 - I've been working on getting something similar, with my main objective being the switching charts.

Although I'm using a spreadsheet as an interim step, the end result will be a Google fusion table. One of the many cool features about fusion tables is that they can be "merged", which is the equivalent of doing a join on SQL tables.

It's easy to move data between fusion table and spreadsheet formats, using .csv files. So instead of collecting all the data fields I have been, I am going to concentrate only on the ones I need that Meechl doesn't already have in her spreadsheet. When I'm done (hopefully today), I'll convert therm both into fusion tables, merge them, and publish the result.

I'm bringing this up now, because it could serve as a more general mechanism for collaboration on gathering data. If anyone makes a spreadsheet where one column is the Eterna solution ID (e.g. 4789644 for Helter Skelter 2), it can easily be merged with everything others have collected.

Once data is in the merged fusion table, anyone can do many cool things with the stock fusion table UI. But even more, tool builders can use the RESTful API automatically available for fusion table to quickly access whatever part the data is of interest to them, and then focus their effort on exploring more customized, Eterna-specific, presentations of the data.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
nice work.
Just to confirm - did you mean to say "one column is for one solution"?, not one row?
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 972 Posts
  • 305 Reply Likes
Each row should hold the data for one solution, yes. But one of the spreadsheet columns should be for the solution ID, since that's what is needed for merging the data.
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
A few ideas for designing

- For those who miss ideas for making designs for the MS2 lab, I suggest either modify a design from last round or modify designs submitted in this round.

- Having a design with only minor changes, compared to one with a known score is very helpful for analysis and finding out what really works, as one can see what small changes, that may improve or make things worse.

- I have been making like 5 to 10 mods of a design from last round. The idea is making a small cluster of data, around a design with a known score. That way I have something to compare, that might be able to tell me what I'm after knowing. Its sort of a way to test multiple hypotheses, from a known spot.
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Test the same ideas in many different designs

I have ended up often making somewhere between 5 - 20 mods of a design. Either one I modify from another player or one I start from scratch. And think of these sibling designs as clusters.

I expect my small data clusters to be helpful for comparison. So instead of comparing 100 of different designs with each other, I am testing multiple puzzles with the same set of ideas.

I'm testing many of the same things in different design. In hope that I can get to know if eg. a segment will do good in several different designs and it is not just a coincidence in one.

So basically two designs be two sets of data clusters, but have many of the mods in common.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 979 Posts
  • 308 Reply Likes
Fusion Table for MS2 Riboswitches on Chip (Round 1) (First Draft)

As I mentioned above, I've created a fusion table Merge of Meechl's MS2 (1/11/15) and Omei's MS2 (1/11/15) that contains all the data in Meechl's spreadsheet, plus more that I added in order to incorporate the switch graphs.

Here's a couple of screenshots that are interesting (to me.)





Feel free to play around with the table. You can try anything you want, up to adding, deleting and changing the data. The thing is, you won't be able to save the changes. But what you should be able to do is export the entire table as a .cvs file and re-import it into a new fusion table that you own.

I'm trying a number of new things here, so I'm sure there will be issues. But I intend to continue work on it, so let me know what you find.
Photo of nando

nando, Player Developer

  • 388 Posts
  • 71 Reply Likes
"... does not have the permissions to view the requested table"
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
excellent work. shall we alls end you our google handle to get access permission?
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 979 Posts
  • 308 Reply Likes
Nando and Sailish - it should be completely public now; let me know if you are still having problems.
Photo of nando

nando, Player Developer

  • 388 Posts
  • 71 Reply Likes
working now, thanks :)
and great tool, thanks for that too :)
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Omei, this is simply amazing. :)

Big thx!
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Distance between aptamer and MS2 sequence matters

Hypothesis - Its not good to have the aptamer sequence right next to the MS2 hairpin sequence.

I have been saying earlier that I thought it bad that aptamer sequence was next or almost next to MS2 hairpin sequence. Now I think I might also be able to explain why.

There were 6 labs in round 1. 4 had aptamer sequence next to or one base apart from the MS2 sequence.

The 2 labs that did best, had in common that their aptamer sequences are well apart from the MS2 hairpin. Far apart is good, and too close can be bad.

I don't think aptamer sequences always have to go that far apart, just even one or two more unlocked bases will help for making better choices for the switching segment. As it will give the option to make a stronger switching segment.

1 base gap between aptamer and MS2 sequence

Here are two examples of labs that had MS2 sequence only one base apart from the aptamer sequence. Among them the worst scoring lab from the 6 MS2 labs, Exclusion 1.

Color explanation

Orange box = hairpin
Green boxes = aptamer sequences
Blue boxes = segments involved in switching

(The hairpin and aptamer sequence are locked)

Exclusion 1


Exclusion 4


While these two labs have different MS2 sequence position, and one have aptamer sequence before and the other one after, what they do have in common, is that there is only one single unlocked base between aptamer sequence and MS2 hairpin sequence.

What these labs also have in common is a much more repetitive solving pattern for the sections that I highlighted with blue to indicate that these would often be involved in making the switch happen - and pair with each other.

In the lab Exclusion 4 many of the best scoring designs uses the two Aptamer G’s and adds a G at the only unlocked spot, so it gets its strong G segment too, with 3 G’s in line. In the Exclusion 1 lab, that scores less well, this can’t be done.

Zero gap between aptamer and MS2 hairpin

What the labs with a 0 gap between aptamer and MS2 hairpin has in common, is having almost identical solves for the switching segments. (Blue highlighted columns).

Exclusion 2 has 0 bases between MS2 hairpin and aptamer sequence.


Notice there is no variation in the last switching segment. Also notice that there is little variation in the first switching segment. The two high scorers have the exact same solve.

Exclusion 3 has zero gap between aptamer and MS2 sequence. Has no variation in the first first switching segment and little in the last.


Here is the topscoring design Web mod, by Jandersonlee from the lab Exclusion 2. I highlighted the switching segments. Notice that the locked bases next to the aptamer, limits the options for varying the solve to make the switch happen. I bet this one works, mainly thanks to the switching push from the G segment added just right before the MS2 hairpin. (Marked with black rings)

Web mod, 74%

http://eterna.cmu.edu/game/browse/473...

Like this one, the better scoring designs generally attempts to create complementary segments elsewhere.

Since a G segment can’t be placed between the aptamer and the MS2 sequence, many of the best scoring designs places it before. (Highlighted in blue below.)



Conclusion and perspective

A 0 gap or 1 base gap between the aptamer and MS2 sequence, simply greatly limits how it is possible solving the switch. The best labs have a far bigger variance in their solves and positioning of the G and/or C segments.

My point is, that with even just one or two more unlocked bases between the aptamer sequence and the MS2 hairpin sequence, we will get a much further range of options for making more and stronger variations of the switch making segment. This will help on getting the switching score up.

I’m starting to think that aptamer distance to MS2 hairpin sequence matters a great deal, perhaps even more than overall MS2 hairpin position in the sequence. Since what characterizes both Same State 1 and Same State 2, that are the labs with the best average switching scores, (Thx Meechl and Omei!) is that they have a lot of air between their aptamer sequences and their ms2 hairpin. But one has its hairpin positioned early and the later, has it in the middle. So it looks like aptamer distance triumphs MS2 position, while the latter still matters also.

I nicked a part from Omei’s data screenshot that nicely demonstrates the difference in switching abilities between the labs:


https://getsatisfaction.com/eternagam...

I also think it matters which part of the aptamer sequence is attached close to what side of the MS2 hairpin. Not each positioning and distance is equal. Especially if there are 0 or few bases between the aptamer sequence and MS2 sequence.

It matters if the aptamer G's are readily available for making a switching segment.

Thx to Machinelves for input on how to better present my idea.
Photo of nando

nando, Player Developer

  • 388 Posts
  • 71 Reply Likes
Maybe it wasn't very obvious, but the choices made in the Exclusion targets were very much on purpose. Also, I don't think it is valid to compare the results between "Exclusion" and "Same state", since their parameters were entirely different.

In the "Same state" targets, the MS2 hairpin is supposed to form with the aptamer. And while we ignored this factor during the design phase, you must understand that the MS2 binding was helping that state with a variable additional bonus. Which may at least partly explain the comparatively better switching statistics.

On the other hand, the problems with the Exclusion targets were much more complex. Not only the MS2 bonus was playing against the switching, but also we tried to make sure that the designed sequences would have as low as possible chances to actually adopt a conformations where both MS2 hairpin and aptamer would be present at the same time. The simplest way to guarantee this condition is to make the respective domains either overlap, or (since it was not really possible here) to place them as close as possible, so as to make that unwanted possibility as remote as possible. If you don't try to eliminate this case, you may end up with sequences that do switch, but the experiment can't see it, because the MS2 protein is still sticking to the MS2 hairpin.

I'm pretty sure we will soon get many opportunities to test other sequences with different parameters. Though I strongly suggest that anyone planning on creating new Exclusion puzzles think very carefully about them, because I don't believe it's as simple as just adding a few bases between the respective domains.
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Hi Nando!

Big thx for your thoughts and additional insights. I did overlook the difference between the Same state and Exclusion labs.

I agree that just adding a few bases between the respective domains won't solve it all. It is more complicated.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 979 Posts
  • 308 Reply Likes
Johan, I've been perusing the switch graphs using the fusion table with the intent to build some intuition for interpreting the graphs. What I have discovered is a desire to be able to define filters on the five statistics in the upper left corner of each graph, in addition to the scores that are derived from them. These stats would also be useful for examining correlations, since they retain more of the fundamental information that the scores that are derived from them.

If you could make available a spreadsheet with those stats, just as you did for the scores and subscores, I would add them to the fusion table.
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Omei, the fusion tables are great and I've been following the great analysis done by players. I have finally uploaded a spreadsheet containing not only the EteRNA scores, but also the fit parameters with associated standard errors from the fits.

https://docs.google.com/spreadsheets/...
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
I have been seeing some of the red and green MS2 complementary segments I have been talking about lately, spread on through the design. Eg, the G segment that is supposed to be pairing with the C's lining up in the MS2 hairpin sequence, is sometimes countered by a mirror C segment after the G segment. Note that the mirror C segment is not supposed to pair up with the G's inside the MS2 hairpin. Rather I think it is so that when the MS2 hairpin needs to fold, the G and its mirroring C segment can pair with each other. Thus the G segment will be away when not needed.

Example puzzle by JL from the current dimer lab:


http://jnicol.eternadev.org/game/brow...

This is what I have been seeing in multiple designs, especially in some of JL's design for the current dimer and to a smaller degree in MS2 too.

Like Brourd pointed out, some of the repeat comes from the locked bases in the structure, like the aptamers. And here also the MS2 hairpin and micro RNA. Like the repeat C and G segments that comes from the MS2 hairpin sequence. Other repeats I think is relating with the puzzle being a switch and need to move.

Earlier I suspected that this periodic repeat I saw in switches were bad. And I still think too much of it are, but I think it might also be helpful to a certain degree.

I see the repeats particular in the dimer labs, but we don't have data back for those yet. But since I think this may be of help, I would like to highlight this earlier forum post and discussion about periodic repeats in switches, in the hope it will aid our designing and analysis.

Periodic repeats in RNA switches
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Optimal range of different bases in relation to each other

I believe there is an optimal % range of G's, A's, U's and C's for the good designs in the MS2 labs. I went through 10 of the top scoring designs from round 1 in the MS2 lab and noted their base distribution. For some bases (C and G) those ranges were quite narrow and close in number, while the range of A's and U's varied more.



- G’s have a narrow range (15-21%)

- C’s have a narrower range than G’s (10-15%)

- U’s range vary the most (6-24%)

- A’s are the most prevalent base, covering a range from 49-65%

- The high scorers generally have more U’s than C’s.

- The high scorers generally have more G’s than Cs.

- The high scorers generally have equal amount or higher % of A's, than all of the other 3 bases together.

Around 37% of the bases are fixed, due to the locked bases.

I expect there to be slight variation between sub lab basis. I also expect the ranges to widen for the next round. But yet I believe there is some optimal range for the highest scoring designs, which will be useful to weed out bad solutions, without looking at base pairs, which we can’t know anyway. But the base content is fixed.

I believe we can use the knowledge that generally C’s will be less numerous than C’s. Than A’s will generally be more numerous than most of the other bases together.

I simply think we can use these base % relationships to our advantage for picking out winners. The truer range will display itself to us soon, when we get more data back.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 978 Posts
  • 308 Reply Likes
Eli, I think this is a good thing to be investigating. Independent of Meechl, I added these stats to the merged fusion table.

Something to think about is that if a statistic, or range of statistics, is proposed for picking out good designs, it needs to not only pick out high scoring designs but also reject low scoring designs. With that in mind, here's a graph of all the designs from Round 1, with each of the base percentages plotted against the Eterna score.



The general impression it gives me is that the ranges for high scores are not all that different from the ranges of other scores. But this is with all designs lumped together; looking at filtered subsets of design may show something else entirely.


Also,the fusion table has a lot more designs with scores of 71 or better. Did you manually exclude some of the designs because of obviously low data counts? Or maybe the fusion table has got some invalid records in it; I haven't checked it carefully. If you find any, let me know.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
nice
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Hehe, wow!

I went to bed having posted a hideous handwritten note, and I woke up to this.

Awesome spreadsheet and fusion table enhancements, beautiful graphs and thoughtful analysis.

You guys are amazing! :)
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Omei, you asked for where I found the top scorers and why I didn't find more.

I got them from the total lab overview, where all the sub labs gets shown in one lab:

http://eterna.cmu.edu/web/browse/4736...

And I can see that I got my sorting somehow wrong as I didn't see the complete set of designs scoring over 70, that actually was included. Thx for catching this.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
top score was 79.96
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Liked the graphing, I did an in-depth analysis of the data, first with the basepairs:
GU:


and GU separated by experiment:


GC:


GC separated by experiment:



AU:


AU separated by experiment:


The outliers always represent designs from the same state experiments.
Ranges of preferable AU/GU/GC pairings can be found in the graphs/
Future work:
1) Relate all 3 subscores to the paitings
2) Make ratios of AU/GU, AU/GC, and GC/GU and compare these to overall score and subscores.
Photo of Eli Fisker

Eli Fisker

  • 2227 Posts
  • 488 Reply Likes
Salish, I like your graphs. :) They are great.

You show that the absolute high scorers have a rather fixed range for how many GC and AU's they use. Which is very helpful to know. And with next round with a ton of data, you will be able to pinpoint the optimal range with even greater certainty.

Good work!
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Then, let's take a look at individual base use, again once for the entire set of all data, and then separated out into experiments:

C%


C% separated:


G%


G% separated:


U%


U% separated:


A%


A% separated:
Photo of Eli Fisker

Eli Fisker

  • 2227 Posts
  • 488 Reply Likes
Salish, you demonstrate it very clearly.

There is something like too much G's used in a design and something like too much C's in a design or something like too much U's used in a design.

And a base ratio that is very good.

Beautiful work!

By the way, your statement of which labs that was most often in the bad range, gave me an idea.

It got me thinking that perhaps, this might be related to the design structure and the position of the MS2 hairpin. For the C% seperated, two of the 3 labs mentioned there, had early positioned MS2 hairpin. (EX1 and SS1)



For now there are way to little data to conclude anything on this, but it might be interesting keeping an eye out for.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
To put it in numbers:
C% must be < 20% and should be 10-14%
G% must be < 30% and should be 15-25%
U% must be < 20%
A% must be 40-60%
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
And let's take a look at the calculated free energy and melting temp

overall data analysis:


Free energy per experiment

To me, this is the most astonishing results It clearly shows that the switch is the most important part, as super stable RNA scores very badly.
My interpretation is that RNA with a high negative free energy are very stable, and won't like to reform into a different shape in the presence of a ligand. This is nicely represented by hard physical evidence here.
There are no molecules with free energies
for top scores (SS scored highest, they will always be in the top list), melting point was 77 degC, but this in no way guarantees high scores.

MP separated by experiments

For ana alysis of the Exclusion experiments, melting points should be either at 67 or 87 degC, as there is a saddle peak for this data set.
Overall, I wish, the melting points would be less discrete and more continuous.