Switch Scores for EteRNA Switch Puzzles

  • 11
  • Article
  • Updated 1 year ago
An exciting direction in EteRNA is the study of riboswitches!

We have recently finished our pilot experiments with great initial success. Using a new technique that measures switching directly on a sequencing chip we directly observe the switching for thousands of designs at once. The signal is generated by a fluorescent RNA binding protein, MS2, and instead of the standard EteRNA score, which is based on the correct folding of each base, we have introduced a new Switch Score.

The Switch Score (0 - 100) has three components:
1) The Switch Subscore (0 - 40)
2) The Baseline Subscore (0 - 30)
3) The Folding Subscore (0 - 30)

The scoring scheme is summarized below. A more detailed description is given in this PDF:
https://drive.google.com/open?id=0B_N0OA9NROPGel80SG5LM0wtZms&authuser=0

A typical example of a switch puzzle is shown below:


The player designs the structures in [1*] and [2]. To observe the switching we then measure the fluorescent signal of MS2, which binds specifically to the MS2 hairpin seen in [2]. In the absence of FMN, the MS2 should bind and the switch is ON. On the other hand, if we introduce FMN, the ligand in [1*], the switch should be OFF and not exhibit fluorescence.

No switch is 100% ON or OFF in the absence or presence of ligand, but a good switch can come very close (and get a perfect EteRNA Switch Score!). A some MS2 concentration, the difference should be large (e.g., at ~100 nM MS2 in figure below). In practice, we don't know this concentration beforehand so instead we perform measurements at many concentrations to obtain binding curves. When the switch turns OFF (red curve), the effective dissociation constant increases. The dissociation constant, Kd, is the concentration where half of the RNA binds MS2.


The Switch Subscore quantifies how far apart the Kd's are in the absence and presence of FMN (horizontal distance between the red and blue curves).

The Baseline Subscore is a measure of how close the ON-state is to the the original MS2 hairpin (lower Kd is better, i.e., blue curve should be far to the left).

The Folding Subscore is high if MS2 bind properly in the ON-state at any concentration (the score should be high for the blue curve at high concentrations of MS2, i.e., high values to the right)

In our first experiments, we found that the easiest score to maximize is the Folding Subscore, followed by the Baseline Subscore. These two ensure that the MS2 hairpin is properly formed in the ON-state. The hard one is the Switch Subscore, which is the highest when the energy difference between the states is finely-tuned to the energy conferred by binding to FMN (or other future ligands).
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes

Posted 5 years ago

  • 11
Photo of Eli Fisker

Eli Fisker

  • 2328 Posts
  • 541 Reply Likes
The MS2 hairpin in Sensor lab, MS2 lab and Logic Gate Puzzles

When I watched the new data for the sensor lab, many of the winners reminded me very much of some the Inverted XOR puzzle and some of the MS2 labs. I basically saw the same pattern with non A bases around the MS2 sequence that I had been seeing in the AND, OR and inverted XOR puzzle. (The XOR puzzle is different as it possible solving it with all the non A bases in the first lane and none in the second.)

For vizualisation of the Logic Gates puzzle, see here:

Vizualizing the Logic Gates



Inverted XOR

Basically I have been wondering about why there was all these non A bases, at both sides around the MS2 hairpin, in the inverted XOR puzzle.

Nando’s solve:


http://nando.eternadev.org/web/puzzle...

I think the non A pattern on either side occur because while the MS2 needs to be turned on for sure, it is also need a way to break it. So one wants it to form, but with something that also do not pair up too well. Which is a role that GU’s do well - splitting when given the least hint for it. And then sometimes a single GC pair gets added to keep things in place. Like in Nando’s original inverted XOR solve above. (Base 93,121)

So the same sequence that helps turn on the MS2 when in company with another, helps turn the MS2 hairpin of, when when it instead is paired up with the MS2 hairpin.



Turning on MS2

Basically the two non A base stretches just around the MS2 hairpin are the MS2 ON switch.

(MS2 Gate 1 and 2)


Most often MS2 gets turned on, by the sequence around it pairs up with each other. (OR, XOR, XOR inverted, Sensor V2, turnoff variant2, Sensor for HSA-mir208a)


Turning off MS2

Example of MS2 (framed in purple) getting turned off by binding to the sequence after itself. 4 state in the Inverted XOR puzzle.



In the Sensor V2, variant 2, the base stretch after the MS2 hairpin are there to turn off the MS2 when it is not needed.

However the MS2 hairpin can be turned of in several ways. Most often it is turned off by the sequence after it (XOR inverted, Sensor for HSA-mir 208a, Sensor V2, turnoff variant 2), often it is turned off by different parts of the design reaching for each their end of the MS2 hairpin (AND, XOR, some of the EX and Same state labs) and sometimes it is turned off by the sequence before it. (AND gate + OR gate)


MS2 Door Gates

Actually when thinking about it, this pairing stretch before the MS2 hairpin, plays a similar role that of the strands that folds around the at the end of the aptamer that is switching. So this is not the aptamer gates, but the MS2 gates. Of cause. :) Though I originally didn’t think that the MS2 needed extra help to turn on, as I knew it was capable of turning itself on already, unlike the aptamer that needed the closing basepairs at the switching end. However when the MS2 gets countered by a complementary stretch that turns it on, the gates on either side of MS2 helps it turn on again.

Similar to the bases in the FMN aptamer that often are involved in the switching (in particular the twin G’s), the MS2 bases are involved in the switching even more so. So MS2 and FMN aptamers are more alike than I already thought.

Typically there are a 3-4 base pairs stretch in front of the MS2 sequence in the solves I have seen for now. EternaBot often don’t have these MS2 extra closing bases. Sometimes it has 2 basepairs ones or an internal loop in front of the MS2 sequence in its on states or longer stretch of base pairs around the MS2. So basically not much of a pattern there.

Source: Wuami’s doc with eternaBot solves

Background about strands closing the switching area of the aptamer:

The Aptamer Matrix
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
Out of curiosity I checked to see how various bases and sub-sequences correlate to cluster size. To eliminate outliers I used only sequences with clusters between 10 and 250 in size.

As Brourd has already shown, too many As seems to be harmful, with an increased A count negatively correlated with cluster size:



However the context seems to matter. When looking at length two sub-sequences, the worst offenders are AA and AG:



For length three, The worst offenders are AAA, GAA, AAG, AGA, and AAU:



While correlation is not causation, it seems that using large open loops with multiple As, optionally speckled with Gs or Us, may not be a good strategy for good cluster yields.

The Google Sheet can be found here. Best viewed in Chrome on a machine with at least 8GB or ram. (You have been warned.)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
I think that looking at these higher order frequencies is a great idea!

How did you calculate the digram percentages for a sequence? I checked your spreadsheet, and they don't seem to add up to 100%, as I would expect. In general, they add up to something less than 200%. It would be fine if they were all simply doubled, since it shouldn't change the correlation coefficients. But the actual sum differs among the sequences.

As a specific example, it looks to me like only one CC digram got counted in the sequence AUUUUACAUGAGGAUCACCCAUGUUUUGGCGGGCAGGAUAUAGAUCGGAUGAGUUCUGUCUAGAAGGGACAUGUU, where I would have expected the CCC trigram to contribute two. (This sequence is for JR_SS001_Sub511, the first on the sheet.)
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
It's a quick hack using string substitution. CC and CCC match one CC. CCCC and CCCCC match two, and so forth. So for repeated/overlapping sequences it can be off. I'm looking for rough correlations, not absolute counts. Also, I think it would be better to correlate to ln(num_clusters) rather than num_clusters. Still playing with it.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
Actually I should have ignored the -20% arbitrary cutoff and included all of the following above:



A slightly stronger case for long stretches of As and G/U speckled A loops.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
Hmm. When I constrain the data to round 93 results the correlation values are even stronger with the number of As at a -0.67 correlation to cluster size, AA at -59.4 and AAA at -51.9. New sheet to follow.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
Rhiju wrote:

A probable explanation of the result is here (note that depurination primarily affects A's):

http://nar.oxfordjournals.org/content/38/8/2522.abstract

And so it seems. When I restrict the MS2 data to round 93 and clusters of 10 to 250 in size, it seems that the strongest correlation to cluster size is simply the number of As, followed by AAs and AAAs. In sequences of length three, having at least two As is bad news, regardless of the third base.

Google sheet here.

So: High percentage Adenine seems adverse to yield. That GAA and AGA and AAG are also fairly bad is likely due to their presence in breaking up large runs of AAAs in open loops. Avoiding large open loops seems one way to reduce the excess As.

Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
I was looking at correlations on score and sub-sequence vs cluster size and sub-sequence, but spreadsheet conflicts are causing questions regarding results. I'll see if I have the time to start over.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
When you restrict the cluster size to 20..250 and the score to 50..80 to help eliminate outliers, there is still a large negative correlation between %As and cluster size (-0.641). Even if you limit the %As to under 40% the correlation is still very large (-0.606).

However, there is no strong correlation between subsequences and score, with the largest correlations (of sequences up to length 3) being 0.148 for the subsequence UAC and -0.136 for the subsequence UAA.

https://docs.google.com/spreadsheets/...
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
This is a great analysis!

Limiting the total number of As seems to be the most straightforward method for achieving better yields and it's something we will implement for the next round.

Unfortunately we have no direct control over the actual synthesis but I'm looking forward to seeing the results, if any, from limiting the A percentage.
Photo of Eli Fisker

Eli Fisker

  • 2328 Posts
  • 541 Reply Likes
Finally, here is one more really good reason why repeat A's can be a problem. Besides the depurination explanation, that we also got.

macclark52: Eli or somebody may already have posted this, but here is a neat news story on multiple AAAs in a mRNA stalling out protein synthesis.

AAAAA Is for Arrested Translation 

I had not. So big thx to Macclark for sharing. :)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
This news story actually fits quite well with our investigations of poly-RNA in the SHAPE labs.

For those that weren't active at that time, we found quite clear evidence that poly-A chains longer than about 6 bases started forming some kind of 3D structure of low enough energy that it protected them from the SHAPE probe.  Whatever it is, this could be the same 3D structure that is jamming up the translation mechanism.  Now if we could only figure out that structure is ...

Another thought. So far as we know, the variations in cluster count is originating in the DNA amplification that gives us the RNA to test, not in the part of the process that actually experiments with the RNA.  As such, a poly-A RNA structure might not be relevant to cluster variation.  But since we don't know what poly-A RNA is doing, perhaps poly-A DNA might do something similar.

Rhiju, suppose we were to submit some designs in pairs, one containing a long poly-A sequence, with its matching pair being identical except having the some of the poly-As replaced by Gs.  If breaking up the poly-A strings with Gs raised the average cluster count, would that be evidence in favor of 3D structure, as opposed to de-purination, being a (the?) cause of the inverse correlation between poly-A and cluster count?
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
So, I was looking at the correlation between d dG and ESc&FSc, and d dG and cluster size.


This looks like Mother Holle (Hulda) is shaking out her pillows more than anything else, so there appears to be no direct correlation...

Also, the number of clusters and d d G show a centralized, well, clustering of the data:


So, I finally looked at the error:
Obviously, for a single cluster, there is no error, so those data are left out of here:

To me, this raises the question of how much error are we willing to accept, and what would be a good cut-off value for the data. We could simply set a 6 sigma boundary on the data, make it one sided (there can never be too low an error) and make the cutoff there, that's, say 3.4 faults in a million. I leave this open for discussion.
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Great plots.

6 sigma is a tall order. Although it is often used in manufacturing (http://en.wikipedia.org/wiki/Six_Sigma), even the Higgs boson was originally only confirmed at a 5 sigma level.

In biology, p-values of 0.05 (~2 sigma) or 0.01 (2.6 sigma) are often used and I think that they are appropriate for guiding us as we learn more about designing switches.

It is also worth noting that the histograms for the KD values in log-spaces, which should be accessible with the results for each design, are not perfectly Gaussian even though they look close by eye.

From the beautiful graphs above, designs with 100 clusters seem enough to always give a fold-change error within a factor of about 1.2 (by eye). 2 sigma at that level corresponds to fold-changes of ~1.4, or a switch subscore of about 4. Most design will be even better defined.

Designs with 11 clusters seem enough to always give a fold-change error factor within 2. In practice, most designs have a lower error. The conclusion is probably that more clusters is always a good thing and reaching more than 10 clusters is a good starting point for ensuring a tight confidence interval. To evaluate a single design, the best is probably to look at the associated error for that particular design.

I'm looking forward to seeing more comments about this.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Sounds good to me, I have estimated the effect of a cutoff at 10. We lose approx 30% of the data (assuming we already lost all 0 cluster data, as that is not included in our set). Data of R93 only.
Quite a chunk, and most of it high scoring. Maybe a lower cutoff would also suffice...

Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
But, if we target %A≤40, we should get a lot more cases with clusters≥10.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Yes - let's hope we can hard-code that as a new "limitation" into the puzzle constraints. I have been "U-ing up" most of my new designs in round 3...
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Data and graphs available here:
https://www.dropbox.com/s/m8os67jku12...
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Oh, and could you include the "0 cluster" data in your R93 export (especially now that the mirna is out). Thx.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
More on %seq vs ln(clusters). Looking at sequences in MS2/93 with 20%≤A≤50%.









The %A has the strongest (negative) correlation, followed by C and U. The %G is not very correlated with the cluster count. If you reduce the range to 20% to 40% for A, the correlations diminish somewhat, and there is a lot of spread.

Still these graphs gives some rough ideas of possible target percentage ranges.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Good correlations, j!
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Additional descriptive data

Meechl sent me a PM saying she had created an extended spreadsheet with some additional columns, i.e. secondary structure, estimate of the free energy and base pair counts / frequencies for both states.

I created a Google fusion table from her spreadsheet.
Photo of Eli Fisker

Eli Fisker

  • 2328 Posts
  • 541 Reply Likes
Hi Omei and Meechl!

This is extremely helpful. Big thx! :)
Photo of Eli Fisker

Eli Fisker

  • 2328 Posts
  • 541 Reply Likes
I got inspired by playing around with the fusion table and ended up writing a small intro on how one can get started using such a one.

Have fun! :)

Intro to Fusion Tables
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
Thanks Eli! I've been struggling trying to figure them out.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
I find them of limited use - my xlsx sheets sadly never upload, for one, due to the extensive tab use (I suppose) or the graphs
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Structure metrics for round 2 submissions

I've calculated some structure metrics for (each state of) each round 2 submission and merged them with the previous data to create a new fusion table.

The metrics are:

* Stem segments. This is the number of distinct groups of consecutive ( or ) characters in the estimated folding. For example, .((((((((....))))((((....))).))))).. would be counted as having 5 segments.
* Hairpin loops. This has the obvious interpretation. The above example has 2.
* Unpaired bases. The number of dots in the secondary structure, 12 for the above example.
* Dangles: Unpaired bases at the ends of structure, 3 in the above case.

In each case, there are three columns -- metric, metric_2 and metric_delta. If anyone has a desire to see other metrics and I can calculate them with a Google sheet macro, I'll add them. If it requires a more complex calculation, we'll have to discuss the effort/reward tradeoff.

If the fusion table UI doesn't suffice for your needs (it is most certainly limited), you can easily get a CSV file with the File/Download command and load that into a spreadsheet, R, or whatever toolset you use.
Photo of Eli Fisker

Eli Fisker

  • 2328 Posts
  • 541 Reply Likes
Omei, this is simply amazing. I have long been wishing to have data like amount of stems available. I see what you done. :)

Since you asked, here are a few other ideas for what could be useful. Just mentioning and not expecting:

- Length of stems.

This is something I have been counting in a past spreadsheet, since I found it useful for categorizing lab structures. Like pressured designs that are hard to solve. Since a huge amount of certain stem lengths (namely short) makes lab puzzles harder to solve.

One more idea (expecting this one to be particularly hard) of things I find useful:

- How many of the stems are actually estimated to be in the switching area?

For switches what really counts, when counting stems, is if they are judged to be in the switching area. Because if they all are, the switch will generally be harder solving. While not necessarily impossible. The majority of the winners we have gotten back on MS2 switches and microRNA are partial switches, not full moving switches - not all of the stems appears to be switching. Most of the Exclusion lab designs seemingly had the aptamer sealed up and static in one end. And several of them had an extra static stem form instead of having dangeling bases.

One day I would love eternabot to have a chat with Meechl's spreadsheet, your fusion table and DataMiner. If the bot knows what is normal amount of stems (interchange with any other interesting feature as well) in winners for a particular kind of structure, then it can hedge its bets better. Read the rules from the data. :)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Thanks, Eli. I like your stem length suggestion. Do you think a single value for the average stem segment length would be useful? That would fall in the easy category. Eterna has Javascript code for determining all the stem lengths, and that code should be useable inside a Google spreadsheet, so it would involve an intermediate amount of work. But it would also produce a bunch of values, which makes it harder to draw simple comparisons. Do you know what you would do if you had the full list of all stem lengths?

Differentiating between stabile and switching segments is also probably an intermediate difficulty -- not hard, but more than can be done with a reasonably compact spreadsheet formula. I'm inclined to see how much utility we can get out of the low-hanging fruit (i.e. simple formulas) for now. We've only got 10 more days to learn what we can from round 2 and apply that to round 3.

And yes,I think that while looking at round 2, we can develop specific recommendations in a form that Eternabot could use to predict results for round 3, before the round 3 measurements are made. And I think it would make pretty good predictions. :-)
Photo of Eli Fisker

Eli Fisker

  • 2328 Posts
  • 541 Reply Likes
Even an stem length average will be useful.

I think that together with your stem segment count could help us shoot towards a more optimal value for both.

By the way, one comment on your find in your Maslow analysis. You found that having more stem segments had positive influence on the folding score. I can add that many of the designs that did well, actually added an extra static single stem. Either for stabilizing one end of the aptamer but some also directly in the middle of the switching area, which I suspect is for bringing the switching elements closer to each other. (https://docs.google.com/document/d/1u...) Both are things I think aids the switching.

Page 2: https://docs.google.com/document/d/1L...

I originally used stem length of different labs to single out what would make a hard static design.

Background documents:
https://docs.google.com/spreadsheets/...
https://docs.google.com/document/d/1a...#

I basically think the same can be done for switches. Stem length together with switching area will help show what will be most optimal.

Oh, and I'm very pleased to hear that differentiating between stable and switching elements might not be a real hard nut to crack. But good point to focus on what can help us for next round.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
OK, I added two new structure metrics. On is the average stem segment length. The other is the position number of the first closing base pair of the first hairpin (LowHairpinPos). So for example, if the 5' end of the structure started with

...((....))..........

LowHairpinPos would be 5. It's kind of an ad hoc metric, but it was easy and it might seemed like it might be interesting.

This new URL is https://www.google.com/fusiontables/D...
Photo of Eli Fisker

Eli Fisker

  • 2328 Posts
  • 541 Reply Likes
Thx Omei!

I like it :)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Johan, I've been looking forward to seeing the binding curve image files that we got for Round 1. All the numeric statistics are great for finding designs of interest. But when it comes to evaluating a specific design, that one image is much more informative than a row of numbers.

Are they still in the plan for round 2? Or perhaps they already are available and I just missed that fact?
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Hi Omei,
The binding curve images made less sense this time around, since we fit the individual clusters rather than the median values at each concentration, i.e., the fully colored dots on the binding curve plot.
If you found these curves more useful than the histograms we showed this time I can try to generate something similar, although the curves will be based on the individual cluster fit medians.
Thanks for bringing this up!
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
interesting difference. Is that because we did the experiments at a different concentration overall?
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Thanks for be willing to do this, Johan. Unless I'm misunderstanding which histograms you're referring to, the histograms summarize data about the set uf designs as a whole. The binding curves serve a different purpose -- a pictorial representation of the parameters for each design. This is really helpful when trying to compare two designs that are similar in structure and/or sequence and yet have significantly different Eterna scores.

I would appreciate still seeing the individual dots in the image, even though you changed the way you calculate the end results. They are a visual indication of the number of clusters for the design, which is an important consideration.
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
There should be detailed histograms for each design, i.e., one image for each of the ~10K designs, but I can't find them either. Thanks for letting us know and stay tuned...
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Ah. I wasn't aware of those. I look forward to seeing them.
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
The result images of the 6 sublabs for MS2 Riblswitches On Chip can be accessed through the results browser:
http://eterna.cmu.edu/web/browse/5448...
http://eterna.cmu.edu/web/browse/5448...
http://eterna.cmu.edu/web/browse/5448...
http://eterna.cmu.edu/web/browse/5448...
http://eterna.cmu.edu/web/browse/5448...
http://eterna.cmu.edu/web/browse/5448...

It's unclear why we can't access them from the puzzle pages, but we're working on it.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Thank you. Those are interesting.

But it does seem like the one image we got in round 1 more succinctly conveyed a lot about how the switch behaved. I don't know how much effort they took, but I, for one, would really appreciate seeing them.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
Much appreciated. A few naive questions from someone a bit late to this party: What are Kd and Fmax and how should one read/interpret these graphs?
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
After fitting the data to a sigmoidal curve, FMax is the top asymptote of the curve, and Kd is the concentration at which the intensity is half the maximum.


The Fold Change, which is the basic measuring of switching, is the ratio of the logs of the two Kd's. Since the horizontal axis on the graph graph is log scaled, the horizontal distance between two Kd values on the graph directly corresponds to the Fold Change.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
kd is the MS2 conc at the point where the UV-vis level is half the maximum
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
hm, maybe I should wait for the page to finish loading... see Omei's picture above for explanation, much better than my ramblings...
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Comparison of similar designs - the Maslow mods

I'm particularly interested in comparing designs that have similar sequences but a large difference in scores. With that in mind, I selected the 58 designs from round 2 that had Maslow in the name, and looked at how the various metrics derived from the predicted secondary structure of each state could be use to predict and improve the Eterna score. (Actually, I focused on the FoldChange value rather than the total Eterna score.) I came away with some specific recommendations for improving the scores for Maslow-based designs in Round 3. For the specifics, see Variations on the Maslow design, Round 2.

Leaving aside the specific numbers, I'm going to go out on a limb and hypothesize that the following general recommendations will generalize to all the exclusion puzzles.

For the next round, if you are making mods to existing designs, consider:

* Decrease the number of unbound bases (increasing the number of bound pars) , in both the ON and OFF states.
* Pay particular attention to increasing the AU pairs (in absolute numbers) and GU pairs in percentage, i.e. from none to at least 1 or 2, in both states.
* Consider increasing the number of stem segments (as opposed to combining segments, or simply extending the existing segments).

These recommendations are made with the intent of nudging average scores higher. Each design still has to be judged on its own merits, and many good design mods will undoubtedly move in the opposite directions.
Photo of Eli Fisker

Eli Fisker

  • 2328 Posts
  • 541 Reply Likes
Beautiful work, Omei!

This is very practical advice for designing plus a fine guide to good settings for the fusion table. Nice analysis. :)
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
I notice in looking at the R93 results that there seem to be some common cases for how relatively high-scoring designs "missed" and have some suppositions on how they might be tweaked.

High Eterna_Score, Low NumberOfClusters
example: Exclusion 1 (5480276) 'Nebkaure Khety II'
Typically these have a high percentage of As (e.g. 54.1%), often unbonded. Reduce the number of As.

Exclusion: High Baseline_Subscore and Folding_Score, Low Switch_Subscore
example: Exclusion 1 (5490401) Tirebi
MS2 bonds well in the ON (FMN-) mode, but the low switch score means there is little difference between the ON (FMN-) and OFF (FMN+) curves. Need to change stacks and boosts to increase the kcal delta between the states. If baseline is high but not 30, may also need to improve the stability of MS2 hairpin in ON (FMN-) mode (e.g. clean up dotplot?).

Same State: High Folding_Score, Low Switch_Subscore
example: Same State 1 (5510749) Garnet 75
MS2 bonds well in the ON (FMN+) mode, but the low switch score means there is little difference between the ON (FMN+) and OFF (FMN-) curves. If baseline==30, we probably need to change stacks and boosts to increase the kcal delta between the states. If baseline is less than 30 it may be too hard to form the MS2 arm in the ON state meaning that a higher concentration of MS2 is needed for its bonding bonus.

less often:

Exclusion: High Switch_subscore, Low Folding_Score
example Exclusion 3 (5498302) salish99-ex3-25
The low folding score means it doesn't get a strong MS2 signal in the ON mode, meaning the MS2 arm is not forming well. It could be too many misfolds of that region. Check the dotplot. High switch score may mean the kcal delta is too large between the states, so it is very slow to switch.

Does this make sense? Anyone see it differently?
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Excellent sumamry. as for the 5498302, I typically tried to have clean dotplots for the early series (numbers 10-30), but I had difficulty doing this, especially for the ex3 and ex4 series...
Photo of Eli Fisker

Eli Fisker

  • 2328 Posts
  • 541 Reply Likes
I started making spreadsheets and splitting data for individual labs and adding a few categories that I thought might reveal something interesting. I ended up making drawings from those. What I find interesting is that many of the lab winners have a fairly similar pattern for switching - that each lab generally has a few variant types that works.

I have been making drawings for the exclusion labs also now. But added the rest too, since I have redone some. I have mainly been interested in the winners and designs scoring over 80.


Lab drawings


























What MS2 wants

I think it is not just the aptamer that wants space around it so it gets its optimal door colors. I think the MS2 hairpin will often need it too.

With part of the FMN aptamer sequence next to it, as in the exclusion labs, it doesn't have that full luxury of free choice like the Same State labs and microRNA.

It is possible solving these exclusion puzzles, as there have been winners in most of the exclusion labs. There are just very few of them, compared to the Same State designs and the microRNA labs. Another thing, all the labs that turns on the MS2 in state 2 (Mir 208A, Same State 1 and Same State 2) have been doing better than the turn off labs.

Now in the Exclusion labs, MS2 can’t as easily have these gate doors on both side of the MS2 hairpin, since the aptamer sequence next to the MS2 sequence is already a given.

So instead the FMN sequence in some cases gets made into one of the MS2 gate door sequence with a complementary sequence on the other side. This is the case in all of the Exclusion labs in at least one of the variants of solves. (Exclusion 1 A + B, Exclusion 2 A + B. Exclusion 3 A + B, Exclusion 4 B)

The MS2 hairpin can get turn on and off in different ways. But what characterize many of the winners, are that is that they get turned on and off in specific ways.

Background articles:

Aptamer doors
MS2 gate doors


Exclusion 1 and 4

MS2 is on the outside of the FMN sequences. Since the aptamer sequence which contain twin G’s and A’s, gets made into MS2 gate door sequence, its complement will be U and C heavy. And that CU element also often gets used for turning the MS2 hairpin off also. So these magnet segment doors for the MS2 hairpin, have a similar function to the sequence around the MS2 hairpins in the logic gate puzzles, despite the base composition being different.


Exclusion 2 and 3

Here the MS2 sequence is between the FMN aptamer sequences. Here the trend is to place a magnet element at the side of the MS2 that do not involve the FMN sequence. Can either be a G or a CU element. For both turn on (together with the FMN on the other side of the MS2 and for turn off of the MS2 hairpin. Also one of the winners solved went without the G and C magnet segments but used mirroring of the MS2 hairpin for turn on and turn off. (Exclusion 2 A)

But else the trend is that the magnet element placed after the MS2 either pairs up with the FMN sequence before MS2 or the MS2 G’s, if a CU or reaches for the MS2 C’s if its a G magnet segment.


MicroRNA

The microRNA labs had many winners, when thinking about that it was the first round for them.

The majority of the 208a solves had a quite different way of solving the MS2 hairpin compared to the MS2 labs. Much less magnet segments and more like mirror complementarity to a MS2 fragment. Which gave long MS2 gate doors on either side of the MS2. Only a minority of the top scorers had a G or C magnet segment in use to turn off the MS2 hairpin.

Of cause microRNA labs has another reason for not needing as many magnet segments, as the FMN aptamer are not there. The twin G's segments in FMN is part responsible for introducing C magnet segments and these can sometimes pair up magnet segments in MS2 or share them. So that too is reason for the different solving style in the microRNA labs. But the microRNA labs show same pattern as the XOR puzzles with MS2 gate doors - with sequence just before and just after the MS2 hairpin. I think we are going to see a lot of that for the future.

MS2 sequence highlighted. Focus on the area close by. Notice the palindromic like sequence around the MS2 hairpin. Its a fragment from the beginning of the MS2 sequence, mirroring on to both side of MS2.




Messy spreadsheets

Exclusion 1
Exclusion 2
Exclusion 4
Same State 1
Same State 2
Mir 208A
Turnoff variant 2, v2

I have also added a bit of details on some of the labs in my thoughts about the MS2 data:

https://docs.google.com/document/d/1u...

Plus I'm beginning to sum up the document:

https://docs.google.com/document/d/1u...
Photo of Eli Fisker

Eli Fisker

  • 2328 Posts
  • 541 Reply Likes
GU content

There are differences in GU content between different solving variations in the same lab. Meaning that top scoring design siblings will have a similar amount of GU content. But since many of the labs have more than just one way of solving, not all styles take an equal amount of GU’s.

With solving styles I mean, that the winners often will have only 2 or 3 main ways of solving, meaning they switch in the same spots and use same way of doing it. Just like there are design siblings.

So I think GU will partially depend on what way one goes about solving the puzzle. A complementary mirroring solving style leaves a different amount of GU content than a magnet segment solving style.

When I go on about magnet segments, I mean designs that uses small short elements of C/U's (often to bind up with the MS2 G's or twin G's in the aptamer or short element of G's that target the MS2 C's.

The other type, usually uses longer stems and have them complementary to a piece or more of the MS2 sequence and have that fragment mirroring outside.

The differences of GU between different types of design can be seen in some of the spreadsheets in the above post.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Nando, great implementation on the A-dometer, thx!
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
Does anyone have (or could generate) a spreadsheet to add to the mix with the predicted Vienna and Vienna2 MFE values for the two states (FMN- and FMN+) of the candidate submissions? I have some ideas to check out regarding this, but the idea of manually determining and entering the 4 values for each of the multiple thousand labs seems daunting.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
I see that the fusion tables have a Free_E_delta field for at least one of the energy models.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Yes, Meechl pulled the estimated free energy values for the two states using a script on Nando's development server. (http://nando.eternadev.org/web/script...) These should match up with what you see in the game UI when using the default Vienna (version 1) energy model.

I say "should" only because I think that I have seen numerous cases where the bound values didn't match up. But when I asked Nando about it and he asked for an example, I couldn't immediately find one. So it may only affect some labs, or maybe I'm just going senile. If you do come across a lab where the UI doesn't match the values in the fusion able, please post it.

I tried modifying Nando's script to use the Vienna2 library by simply changing the library name, but that didn't work. I'll be asking Nando about what it might take to get those values into a file.
Photo of nando

nando, Player Developer

  • 393 Posts
  • 74 Reply Likes
For Vienna2, use the example at http://nando.eternadev.org/web/script...
The foldWithConstraint method is simply a little different.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Nando, I interpreted that as saying I should simply replace the vrna and LibVrnaXXX.prototype.foldWithConstraint definitions with those from the URL posted above.

But when I tested it on design 5502357 (Mat - Exclusion 1 - Eli mod - D91) with these values

data: 5502357,GAAAAACAUGAGGAUCACCCAUGUAAGGAUAUGCUGAAAAACACGGAAAAAACGUGCAGCAGAAGGUACAAAAG

cst:
........................|xxxxxx|............................|xxxxx|.......

I got the result

5502357,GAAAAACAUGAGGAUCACCCAUGUAAGGAUAUGCUGAAAAACACGGAAAAAACGUGCAGCAGAAGGUACAAAAG,-15.60,-12.90,.....(((((.((....))))))).......(((((.....((((.......))))))))).............,........((.((....))))((((......(((((.....((((.......))))))))).....))))....

But the UI for that design, at http://eterna.cmu.edu/game/solution/5..., has the Vienna2 anergy values as -15.5 and -12.9. Close, but not identical unless them is some funky rounding going on.

Trying a second design (Tirebi), I get the result 5490401,GCCUCACAUGAGGAUCACCCAUGUCAGGAUAUAGGAGACGGCAGAAAUCCUGCCAUCGCUAGAAGGGACUGGCA,-25.30,-22.50,.(((.(((((.((....))))))).)))........((.(((((.....))))).))(((((......))))).,.((((....)))).....(((.(((......(((..((.

where the UI says -25.7 and -22. So the calculations definitely seem different.

(FWIW, the numbers in your comment for Tirebi are -25.30 and -23.10.)

Did I oversimplify the change needed to get the same numbers as the UI?

Perhaps the effort it would take to get the fusion table and UI exactly matching isn't worth the benefit. If the only difference is minor variations due to slightly different energy parameters and we have no reason to think one is better than the other, I should just let it drop.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Nando, in the thread just below, Jeff reported that the Vienna1 energies didn't match up between the game UI and script for the JL Maslow Revisited mod 14 (5483048) design. This is an example of a Vienna 1 discrepancy that I couldn't put my hands on earlier today.
Photo of nando

nando, Player Developer

  • 393 Posts
  • 74 Reply Likes
Let's begin with Vienna 2.

The javascript library that I generated for EternaScript was "translated" some time ago from the Vienna 2.1.1 version using a tool called EmScripten.

Very recently, the team of the Vienna package released 2.1.9. Please check http://www.tbi.univie.ac.at/RNA/chang... for some quite important details...

I reacted immediately for the game UI, updating the "alchemized" Flash build, but I haven't found the time to fix the EternaScript version yet (EmScripten is pretty complicated, and to be blunt, quite annoying)

So in conclusion, discrepancies for Vienna 2.x should be rare, but they shouldn't be too surprising. And the Flash UI is the correct one.

About Vienna 1:

The Flash code goes about it in following way: during the dynamic programming, if the binding site is found to form the proper loop enclosed by the proper pairs, the bonus is applied. As a result, either the bonus is enough to beat a binding-site-less MFE, or it isn't. And you end up with a 100% valid MFE, which is either bound or not.

The "shortcuts" used in these batch processing scripts you've used, are meant to work differently. The constrained folding in essence assumes that the binding site is part of the final MFE, in other words, it is assumed that the bonus will be enough to "beat" the unbound MFE. So discrepancies may occur if the bonus is actually insufficient. In most of my other scripts using these routines (the mimics related ones for instance), there are additional (posterior) tests to prevent that. But in these batch processing scripts, none is coded...
Photo of nando

nando, Player Developer

  • 393 Posts
  • 74 Reply Likes
Oh, and about Vinnie: I updated my bot with Vienna 2.1.9, of course, but additionally, I've added a couple tunings, some based on my verifying how the Turner 2004 parameters were integrated (the Vienna team seems to have made "personal" choices there) and also, a few other things related to personal hypotheses about the stability of certain loop motifs (for instance, I have this "hunch" that things like the Sarcin-Ricin loop, or the FMN aptamer might be intrinsically a little more stable than the models predict)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Thanks for the explanation; I understand the difference now.

My near term desire is to predict KdFMN in the MS2 labs from a partition function for the bound state. Specifically, I want to try estimating the probability of the MS2 aptamer folding into the necessary hairpin structure, and use that instead of the estimated MFE, as the basis for the prediction.

(I say "near term" instead of "immediate" because I can use Vienna as it is to predict the KdNOFMN values, so I plan to do that first. If that doesn't work well, there's no point in putting in the effort for the FMN state.)

But assuming I get promising results for the NOFMN state, I'll want to proceed to the FMN state. I had been planning on using the constrained version of Vienna for that. But you mentioned in another thread that you once had
a nodejs script that produced dot plots based on ligand+concentration. It sounds like that would be an improvement over my existing plan. Can you tell me more about it? Although seeing the dot plot is nice, it is really just the raw base pair data that I need.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
Maslow Mods Revisited

Intrigued by Omei's efforts, I used one of the EternaScript distance scripts to find sequences that were "close mods" to the R93 'JL Maslow Revisted mod 14'. It found 183 sequences within distance 40 (about 10 base changes) from that sequence, not all of which had 'Maslow' in the name. A Fusion table with the distance metric added in can be found here.

Looking at relatively good performing sequences (e.g. baseline_subscore≥28 and Aperc≤40 and NumberOfClusters≥10) left 55 variants showing a peak in switch_subscore somewhere between a Free_E_delta of -3.0 and -1.8 kcal.

It is left as an exercise for the reader to do the same for other high scoring R93 variants.

Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
Hmm. Interesting. The highest scoring sequence (JL Maslow Revisited mod 14), which I accidentally left out of the graph, has Free_E_delta listed as 0.0 although the lab tool shows a delta of (-19.1 - -21.9 =) 2.8 kcal in the Vienna2 energy model and (-21.5 - -24.3 =) 2.8 kcal in the Vienna energy model.

How far should we trust Free_E_delta values in the Fusion Table I wonder?
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
I checked the two energy estimates in the fusion table (both -24.3) with the script on Nando's server (http://nando.eternadev.org/web/script...) that Meechl used, and they are consistent. So the 0.0 for JL Maslow Revisited mod 14 is using the same energy calculations as the other points in your graph above.

But you found what I couldn't find earlier today, a discrepancy in Vienna1 estimates between that script and the numbers currently showing up in the UI. In the thread just above, I had just posted some discrepancies in the Vienna2 results. But the difference you found are even larger. I'll bring these to Nando's attention, although I think he is following this topic anyway.

It would certainly be nice to have a way of getting spreadsheet/database numbers that match the UI values, just to limit the confusion. But I doubt we're at the point where there is a clear choice as to which energy model is best.

Turned around, somebody probably could look at all the data Johan is generating and determine which of the various energy model versions floating around does the best job of predicting the data.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
That's why I'd like to have both energy models: so we can try to correlate between model predicted values and lab results.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
"It is left as an exercise for the reader..." made my day. Thanks, j! :-)
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
A more complete description of how to do this "homework" can be found in a Google Doc Creating a Nearby Solutions Fusion Table. Many thanks to Eli for his assistance and suggestions on this document.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Jeff and Eli, great job on the Creating a Nearby Solutions Fusion Table doc!
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
Another targeted fusion table, this one based on variants of the R93 Exclusion 2 submission (5504382) 'Mat - Same state - Mayanna Mod - D121'



This one looks to me like it should have a Free Energy delta somewhere between 1 and 3.
Photo of Hyphema

Hyphema

  • 91 Posts
  • 25 Reply Likes
I like that graph JL, do you think you can do a Maslow design version of that? I know you did some mods as well as I so perhaps that can shed some light there.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
Look up. The previous graph was some Maslow variants. If you want a specific root sequence, let me know and I'll see if I can walk you through generating it.
Photo of Hyphema

Hyphema

  • 91 Posts
  • 25 Reply Likes
Oops, I must have missed that graph, thanks! I fo really need to know how to do that. I would be interested to see what changed in my mods to have the score vary from 30 something to 70 something. Knowing how u to use the table would help with nitpicking the various items like energy etc
Photo of nando

nando, Player Developer

  • 393 Posts
  • 74 Reply Likes
This thread has gone above freak length anyway, so I'll just add my own grain of salt too :P

I've been trying to make sense of why one of Vinnie's designs did so well in Same State 2. "Soteed" scored 95, with a very reasonable confidence (~50 clusters). I didn't find anything special in the various metrics of the design. Finding myself without any clues, I tried something I hadn't considered in a long while:



This is the output of the barriers program from the Vienna package, see http://rna.tbi.univie.ac.at/cgi-bin/b...
The rest of the musing below takes into account that one need to trust both the folding engine, and the design choices made for computing the free energy landscape in the barrier tool. Now, that we know we're assuming certain things (that the tools above are more or less precise and predictive), let's see where it takes us.

At first, this above graph wasn't surprising to me. I was expecting to see these 2 more or less well defined "funnels", indicative of 2 main basins in the free energy landscape of the polymer. Something did kinda blow my mind though.



For designing "good" switches, I always assumed that one would need to make the barrier as low as possible. But this seems to be (possibly very) incorrect.

One need to understand here that the height of the free energy barrier does not affect the relative concentration of the various possible conformations, only the differences in free energies between states do. So, if for instance your state A is 3 kcal more unstable than your state B, then your state B is about 5^3 = 125 times more populated in the solution than your state A, with an important caveat here: when the solution has reached equilibrium.

What the free energy barrier height does affect though, is the transition rate, in other words, how fast and often individual RNA polymers are switching from states to states.

The result above seems to show that a good switch can accommodate for a surprisingly high barrier. But is it actually a surprise? Could it actually be some sort of a rule? Something like:
  • if a barrier is too high (>30 kcal maybe?), the RNA polymers tend to stay stuck in their states, and no switching can occur in a timely fashion (remember Hand & Finger? or Lines by Starryjess?)

  • if a barrier is too low, the molecules switch so fast that the probes are essentially useless, unable to distinguish between the states


If you're still reading this, you're probably crazy enough to go take a look at a scientific paper ;) http://www.ncbi.nlm.nih.gov/pmc/artic... The subject is kind of related, but more importantly, I'd like to get your attention to the Figure 1, specially the top part. This seems to indicate that specific conformational transition types are typically seen at specific energy scales and in specific time spans. Specially, secondary structure transitions are sort of expected to occur when free energy swings by about 17 kcal.

Maybe this could also explain the problems encountered with the Exclusion targets. When I designed them, I thought it was important to make them so that the MS2 and FMN motifs would naturally exclude each other, giving the least possible chances for a conformation with both present. But it turns out, nearly all Exclusion designs I tried in the barriers website returned a very flattened free energy landscape (well, compared to the above one). It looks as though the relative position of the motifs in the sequence "force" the designs to switch easily by way of a "zipping" mechanism (a stack is built simultaneously while another is 'unmade')

Ok, I suppose I inflicted enough, I'll stop with the rambling here :)
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
I tried exploring the barriers idea in some of the early switch labs and found it didn't seem to make too much difference. My miRNA designs which did do well last round were mostly of the zipping kind though, so it seems it possibly *can* sometimes help.
Photo of Eli Fisker

Eli Fisker

  • 2328 Posts
  • 541 Reply Likes
Hi Nando!

Thx for the entertaining "rambling". While I do not understand all the energy details, I thought I would add some thoughts on element positioning and why the zippering mirroring on solving style has value and when in particular.

I am suspecting that we are going to see much more of the zipping, because of the results I see in the microRNA labs and even the in silico solves of the logic gate puzzles. I only think we don’t see it fully yet, due to positioning of the FMN aptamer right next to the MS2 sequence in most of the MS2 labs.

I think we will see complementarity with next door sequence/zippering far more of it for following reasons:

If FMN and MS2 are in same puzzle and in each their state, distance is important.

- Too close - like next door neighbours, I think is bad. Because that stifle the options FMN aptamer and MS2 gate doors (Both FMN and MS2 like to have them) and thus reduces the amount of potential solving ways for both FMN and MS2.

- Too far apart is bad for the switch happening. Solve seen: Making static stem of single bases between FMN and MS2 to bring them closer in 3D space.

- Having MS2 in between FMN1 and FMN 2 is better than having on the outside at either end of the sequence. I think this is why it is worse having MS2 at either extreme end of the RNA sequence: In most of the labs MS2 has more than one fragment of it binding up elsewhere for turnoff. I think it is particularly the FMN aptamers fault due to its position next to MS2. Most of these top scorers also typically solve in a magnet segment style using aptamer G’s or adding C segments.

- I expect the complementary solving style to grow stronger compared to the magnet segment solving style, when the MS2 and FMN gets a little more apart or the lab doesn't have FMN. However I think the segment solving style will still have a place as long as the FMN is there, as it partially provokes it due to its twin G’s being a convenient anchor for the MS2 C’s or an added C segment.

The microRNA labs shows a strong preference for having just the sequence at one side (right side) for turn off. (This complementary mirroring on (Zipper), can also work at distance, it doesn't have to be right after the sequence it is mirroring.) Also MS2 generally likes having having the two sequences just around it pair up with each other for turn on. Having the MS2 at end positioning in the RNA sequence or FMN close, makes this harder to achieve.

I believe the zipping mechanism/complementary solving style is a bridge between making different elements be complementary enough to work together for turning both on and off either state. Often the zipping is made by 4 equal length strands, that are complementary. That can pair up with each other and make two different set of stems.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Nice graph Nando.

I wonder how we could implement a visualization of that barrier into the graphical interface durin ght ecreation of our submissions?
Photo of nando

nando, Player Developer

  • 393 Posts
  • 74 Reply Likes
@salish99: we probably cannot. The computation is just too expensive, and as jandersonlee pointed out, it remains to be shown that they would be useful. I wanted to mention the tool and share how intriguing the outputs were for me. Everything else are just wild guesses at this point. Though, I do believe that this is an additional argument in favor of trying other types of Exclusion targets, and also the reason why I'm rather pessimistic about the current ones.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
KDOFF Important to Switch_Subscore

More lunchtime fusion table fun.

It seems that higher KDOFF correlates to better switching subscores, at least when you restrict yourself to looking at a single puzzle at a time and sequences that tended to switch at least a little (Switch_Subscore>=1) and had at least a few clusters (NumberOfClusters>=5).

Here are charts for Exclusion 3:



and Same State 2:



KDON is not correlated per se, although is does look like the higher Switch_Subscore sequences were clustered with a KDON between 10 and 50 for Same State 2 for example.



So what design strategies would help lead to a higher KDOFF and a fixed range KDON?

(As an aside, restricting to a single puzzle at a time with high baseline and folding subscores and at least 5 clusters also tightens the correlation between %A and NumberOfClusters. Sample charts illustrating that also found in the same fusion table.)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
The switch subscore is defined as the log of the ratio between KdOFF and KdON (scaled and clamped at low and high values to map to the 0-40 range). The details are in Scoring of Riboswitches in EteRNA.

On a closely related issue, I'm investigating a technique for estimating KdOFF and KdON from the dot plots. If that turns out to be be predictive, it could also suggest strategies for designing switches.

As to my document, I'm just beginning to add examples I have observed, but I welcome any comments, as well as other examples, either supporting or contradicting, that other players have observed.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
Thanks Omei. That the "switch subscore is defined as the log of the ratio between KdOFF and KdON (scaled and clamped at low and high values to map to the 0-40 range)" explains why the KdON graph for Same State 2 is a skewed version of the KdOFF graph.

Still, I repeat my question:

So what design strategies would help lead to a higher KDOFF and a fixed range KDON?
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
@Omei: regarding the dot plots: the graphs are only as good as the models behind them. I used to spend a lot of time pouring over the energy models and external tools - I even made several for myself. In the end I found that the models didn't seem to be accurate enough (as yet) to determine the fine details of why this or that sequence would score better. (They can help you get into the ballpark though.)

For example, one tool I built suggests the following:


JL Maslow Revisited Mod 12
MS2 bonus: 0.0% FMN, 81.2% MS2
no bonus: 13.3% FMN, 66.6% MS2
FMN+ bonus: 99.8% FMN, 0.2% MS2

JL Maslow Revisited Mod14
MS2 bonus: 0.0% FMN, 68.8% MS2
no bonus: 0.7% FMN, 65.8% MS2
FMN+ bonus: 95.5% FMN, 3.0% MS2


The tool uses RNAsubopt to tally up the alternative folding shapes in predicted proportions at about room temperature according to the energy model, The two columns for each sequence predict how many will have the FMN aptamer loop versus how many will have the MS2 binding arm. The rows depend on whether you give a Free Energy binding bonus to FMN+ binding or MS2 binding or no bonus at all.

From this data it seems to me like the Mod12 version switches more decisively and should therefore score better. However in the lab, Mod14 scored higher. I think this tool uses one of the older Vienna energy models. Perhaps with Nando's energy model mods it could do better.

I also have many other cases where I spent hours fine tuning designs to try and squeeze out the last bit of oomph from the energy model by "optimizing" a starting design. In most cases the starting design scored better in the labs. The tweaks often seemed to utilize some "feature" for which the energy model gave a special energy bonus (like a particular boosted loop).

So these days (especially given the 100 designs per shape) I tend to glance at the dot plots rather than pour over them, and use a "shotgun" approach, creating many sightly variant designs on a common theme, and tend to prefer designs that fold similarly in both energy models available in the lab tool; that would seem to help avoid model-specific gains. It worked for me in the miRNA lab at least.

Could I have told you in advance which designs would score in the 80s versus the 60s - nope. However my hope is that the many slight variants will give the folks who want to tweak the energy models some ammunition to help better attack that task, so that some day we will be able to use the dot plots for fine grained prediction.

Just my experience - perhaps yours will differ.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Thank you for sharing your experience, Jeff.

I think the main difference between what you describe and what I am investigating is that I've come to think that it is important to estimate KDON and KDOFF separately and then derive the Eterna scores, rather than estimate the Eterna scores directly from the energy model predictions. As long as the KDON is not "too" high and KDOFF is not "too" low, there is no best value for either. It is the ratio between them that is important -- the larger the ratio, the better the switch.

For an exclusion lab, how do I think one can lower KDON? By reducing the possibility that the the bases in the MS2 hairpin do anything other than form a hairpin. This can be viewed as a form of "clean dot plot" heuristic, except that the points in the dot plot that need to be considered are only those involving one (but not both) of the bases in the hairpin.

To raise KDOFF in an exclusion lab, we want to minimize the possibility that the MS2 hairpin is formed for the subset of foldings that are possible when the FMN is present. This can also be expressed as a "clean dot plot" heuristic, where the area of interest is the 7 base pair dots that represent the MS2 hairpin.

Does that make any sense?
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
I agree in principle that for an exclusion lab, KdON is lowered by making the MS2 hairpin more likely in the FMN- form, which is what I tried to measure using the energy the model above. Having a clean dot-plot for those bases is one way qualitative to judge it, and the related quantitative approach is to estimate the frequency of the MS2 arm occurring in the in-vitro folded sequences. That's what I estimated using RNAsubopt and some math.

Likewise to raise KdOFF in an exclusion lab, you probably want the MS2 arm not to appear when the FMN aptamer is present (by design). Qualitatively that could be viewed as a clean dotplot in that area for FMN+ and quantitatively it is a low predicted frequency of the MS2 arm in the FMN+ condition. That's also what I tried to do, quantitatively rather than qualitatively.

One challenge is, the energy model is just a model, and the dot plot is therefore just an estimate. Perhaps in addition, the bonus I used was off, perhaps my math was off, but the predictions I derived from using the energy model were not good enough to accurately determine which of Mod12 or Mod14 would score better, simply that both were potential candidates.

If you can figure out how to more accurately do so - the more power to you. However I'd advise you to see if you can get from Nando, ViennaUTC's secret sauce, since that energy model seems a bit better than most.
Photo of nando

nando, Player Developer

  • 393 Posts
  • 74 Reply Likes
@jandersonlee: the energy model used by Vinnie is no secret: https://github.com/ElNando888/vrna-hack

It's essentially Vienna 2.1.9 with a few features and a few parameter tunings.

The energy model doesn't say anything about switches though, so Vinnie has its own evaluation function. While I won't share this code (it's really too ugly), I'd be glad to explain and discuss the topic if anyone is interested.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Nando, I'm interested in hearing about how Vinnie evaluates switches, over and above running the energy model with and without the FMN constraint.
Photo of nando

nando, Player Developer

  • 393 Posts
  • 74 Reply Likes
I tried many different algorithms over the years. The one I'm currently using is simplistic and probably fails to take dynamic factors into account. But in a few words, the basic principle consists in considering the hypothetical situation where the design is so perfect that there are only 2 possibles states, unbound and bound, and how this would affect the partition function (dotplot)

In this hypothetical situation:
  • a pair appearing in both states should have probability 1

  • a pair appearing only in the unbound state should have a probability X

  • a pair appearing only in the bound state should have a probability Y

  • a base that stays unpaired in both states should have probability 0 everywhere


A long time ago, I gave a "lecture" about thermodynamics.
https://docs.google.com/presentation/...
This graph shows the basic relationship in a switch.

For the purpose of creating a scoring algorithm, I also assumed that the "perfect" switch should have ∆(unbound,binding-ready) = ∆(bound,unbound). So, taking the case where FMN gives 4 kcal/mol bonus, the difference should supposedly be 2 kcal/mol.

If we round things up, this means that the unbound conformation should be 5^2 = 25 times more populated than the binding-ready shape in the no-FMN condition. (in 5^2, the 2 comes from 2 kcal, and see https://getsatisfaction.com/eternagam... for where the 5 approximation comes from)

So, now wrapping up:
Population unbound: 25 / (25+1) = 0.96 (this gives us the X mentioned above)
Population binding-ready: 1 / (25+1) = 0.04 (and that's the Y)

And Vinnie simply mesures how much the designs diverge from that "perfect standard" to score them.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 131 Reply Likes
I like your reasoning and analysis Nando. Thanks!
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1026 Posts
  • 332 Reply Likes
Thank you, Nando!

It's interesting to see how different simplifying assumptions lead to different analyses.

Have you looked at how well Vinnie's scoring predicts the MS2 lab results?
Photo of nando

nando, Player Developer

  • 393 Posts
  • 74 Reply Likes
No I haven't. Or rather, a very quick look tells me already a lot: the spread of scores for Vinnie's designs show that the static factors (pairing probabilities) are clearly not the whole story, otherwise most of Vinnie's designs would be winners. And this is why I mentioned earlier that I thought that this evaluation function, while useful, is probably too simplistic. Intuitively, I'd say that it simply fails to take into account dynamic factors (free energy landscapes, with their funnels and barriers), which explains why I'm looking into them at the moment.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Finally got around to do a little stat analysis on the whole R93 data.
This is a small excerpt for the cluster data, Kd,on, and Kd,off
First, the cluster data:



and tis one shows that the data is not fittable to standard stats, due to the large tail at the "1 cluster" value. Another argument to ignore single clusters in the future.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Nobody got a comment? You all agree?
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Then, the kd, off




kd,on (domination by those SS values)


Photo of Eli Fisker

Eli Fisker

  • 2328 Posts
  • 541 Reply Likes
Switches needs higher entropy than static designs. I think there is a connection between repeat bases and higher entropy.

I basically think base repeats are a way of raising entropy. Repeats in themselves are not enough, the ratio between repeats also matters for the result too. I think we need to raise the amount of repeat bases when solving switches - that is except for A repeats.

Putting some of the RNA switch puzzle pieces together

I was reading a new science paper on riboswitches. It says that entropy is higher for RNA designs that are also riboswitches.

Pablo gave a lecture entropy in relation to switches some time back and mentioned that switches lands in a higher entropy range. (Here an intro by Machinelves and a link:

Entropy, RNA and free energy.

Usually our good static designs of the past had low entropy - meaning they were highly ordered and unlikely to take on many other forms. Although low entropy was no guarantee of winners. Ding on entropy. Whereas designs with high entropy was always bad.

However since natural occurring riboswitches and our RNA switches, needs to be able to shift shape, it makes perfectly good sense that they will also need to have higher entropy.

What particularly caught my attention in the new switch paper, was the collection of riboswitch sequences on page 31-33. I thought I saw raised amount of repeats compared to what I thought normal for an average RNA static design, so I brought out my Indian Ink. :)

Paper:
Secondary structural entropy in RNA switch (Riboswitch) identification













How to raise entropy?

All these repeats in the riboswitches made me wonder if there was any connection between repeats and high entropy.

What has been telltale of high entropy designs earlier? Bingo - repeat bases. :)

The bad scoring designs of past static designs, not only scored bad, they often contained a hideous amount of repeats. Something I was not pleased about.

Back then we had rather small designs and I later learned that repeat bases are more welcome certain places. In particular in longer stems and for U repeats in bigger loops.

Base repeats in natural riboswitches

I also noticed that natural RNA switches often seems to be riddled with repeat bases, to a different frequency than normal static and was perplexed about the heavy G and C repeat base sequences. Because I thought GC pairs and lots of them as hallmark of making things stable, which was not something we were too interested in for switches. But while there was a huge extent of G and C repeat, there also were exceptions, so I didn’t knew what to make of it.

Collection of screenshots of riboswitches:

Rfam picture archive of riboswitches

Patterns in the natural riboswitches set

Dataset:
https://docs.google.com/spreadsheets/...

What patterns I see for the riboswitches that I have been playing with.


  • There is an unusual low amount of A repeats sometimes just above, but most often below to the other highest amount of repeat bases of other colors. In the static RNA labs, A repeats are usually the most frequent type of repeat base.


  • There is a higher than usual rate of G and C repeats. And Us as well.


  • There often is a relation between the number of A and U repeats and between C and G repeats. No surprise there. Its similar to that there is a relation between A and U bases and C and G bases, because of Watson Crick pairing.


  • There is a different repeat base ratio for switches compared to static designs. Whereas static designs as a rough estimate seems like they may have 20-50% repeat bases - depending on factors like stem length and structure in general, the riboswitches more often may have a repeat base ratio of 30-50%. I’m guessing where a static design often will land in the 30% repeat base range, a switch more often lands in a 40% repeat base range.


  • Even should static and switch designs have same amount of repeat bases, the ”colors” of repeats are not distributed in the same way. There are fewer A repeats for the switches.


  • Generally entropy of switches is high. Between 0.8 and 2.8 (there can be multiple peaks). With an average of 1.8, which is pretty high compared to what is normal for static designs.


  • I was wondering if the areas that are supposed to be non switching has low entropy compared to the supposedly switching areas. It seems to be the case so far. So while Vienna is only meant for single state puzzles, it seems to be revealing the switching areas, in the positional entropy drawings. Still there are many repeats in the static parts of the switches too. Something that I have been wondering about. I might have found a possible explanation.


  • Typically it is the shorter riboswitches that show a less of a pattern for the repeats.


  • Ensemble diversity is lower for short sequences - Kind of makes sense - as the shorter the sequence, the fewer alternative pairing options there will be.

    Ensemble diversity is how much time RNA stays in the actual "target" shape. When we want RNA to stay in one target shape, then we want ensemble diversity to be low.

    Ensemble diversity often isn’t as low for many of the switches as it is for the static designs.

    Therefore it makes sense that when we want RNA to change between multiple shapes, then we may want ensemble diversity to be higher.


  • C repeats - somewhat close but most often not as close as G repeats. Sometimes there are 3 C repeats in a short row and similar distance between.


  • Short sequence = lower entropy than longer sequences.



This also might explain why there is a minority of riboswitches that don’t have the general patterns that I noticed for riboswitches with the raised amount of repeats and C and G repeats in particular. These exceptions are usually the short ones. Short riboswitches often don’t have as high entropy as the bigger ones. (I think this is a general for short RNAs versus long RNAs be they switch or not.

And I know that natural RNA when real huge, also gets a lot more longer repeats. But even with growing length of RNA for both switch and static, this will mean more repeat bases - the ratio and the kind of repeats are not the same.)

The short exception among the riboswitches, don’t have the same amount of repeats bases or C and G repeats. I think them being short means they can compensate other ways. They often don’t even have that repeat bases I otherwise found in many of the riboswitches.

RNA switches and base repeats

In the early switch labs I noticed that repeat sequences (not base repeats) seemed to spread like wildfire in some of the labs. Periodic repeats in rna switches

Now a lot of the switch sequence repeats seemed to be caused by due to spread of FMN aptamer repeats. As Brourd demonstrated with the TEP design, the TEP aptamer repeats doesn’t spread the same sequence repeats as the FMN aptamer ones.

But most of the natural riboswitches don’t even have FMN aptamers, and still most of them have an unreasonable high rate of repeat bases compared to static labs. And when looking at the TEP design with that in mind, this particular design has an unusual high rate of repeat bases (55%).


Image taken from Brourd’s comment

Even in switches where the repeats caused by the FMN aptamer, were contained in the switching area, due to the switch only being a partial switch, there is still a raised amount of repeats outside of the switching area. Also the FMN aptamer causing repeats, does doesn’t explain the repeats in the many other types of switches, that does not have FMN.

Sequence repeats

I suspect that base repeats are not the only thing that raise entropy. I think that some specific sequences may also contribute more. And it also matters where they are placed. Some sequence may cause more trouble if placed in stem or loop.

Switches are also rich in other kind of repeats that are not necessarily base repeats. Such as sequence repeats. Such as ...CUC... and ...GUG... which is normally less than helpful in static puzzles in bigger amounts. Example from static labs: Strand repetition ban.

Similar goes for sequences like GUGG, CUUC and so on too. Basically the basepairs are not well enough mixed - too many are turning the same way - to form a stable stem in a static design, especially not if there are multiple of these sequences. However in switches they thrive.

Sequences like GUG and CUC can make stems unstable, if there are too much of them and they continue beyond just a few bases. While G and C base repeats in loops can make stems nearby unstable. Actually the FMN aptamer and for its switch mechanism, and even the MS2 hairpin while it hides its repeats in the stem.

And both these kind of repeats - base and sequence repeats - are high in switch labs and similar in failed static designs.

What sequences do you think will help raise entropy and make switch RNA switch?

Eterna at work

This that the frequency of A repeats needs to be lowered for switches and the A repeat fragments made shorter also plays fine together with why there needs to be a limit on A’s for our MS2 switches. As the story from the Eterna lab goes:

After we got first round results back from the MS2 lab, Johan gave us an update:

“The highest scoring designs had very few clusters, so beware when interpreting the results. “

http://eterna.cmu.edu/web/lab/5448678/

This made me wonder if I could find anything that separated designs with low amount of clusters from the winners. I noticed that designs that got low cluster size had high A percentage and long A repeats.

This got confirmed with graphs by Brourd and later janderson made statistics and we got the A meter - thx, Nando!



However if you check jandersonlee’s numbers for UUU, GGG and CCC repeats, those do not score bad, compared with the AAA’s one.

So it is only the long repeat A’s that gets in trouble and if there are lots of A’s repeat. In natural riboswitches there is regularly 4 G’s in row - something that our lab prohibit - and I have even seen a fiver.

What’s frequency got to do with it?

It is not just that having more G and C repeats are good for switches, there is also something about A repeats, that makes them problematic. U repeats also seems to be beneficial in a higher degree than usual for static labs. Something that has also been visible in our recent MS2 and mirRNA lab results.

Which reminds me of the intrinsical labs (http://eterna.cmu.edu/web/labs/past/?...) that were playing with frequency A repeats. These labs had forced base frequency.

When I checked the Intrinsical 8 lab, the designs tend to have super low ensemble diversity, something which is normally counted good for static designs. But most of the intrinsical labs has crappy signal to noise ratios and they score crappy. The longer the stretch of A before a breaking base, the lower the entropy.

A few frequencies came back with better signal to noise ratio than the others. But else most of the rounds came back with bad noise. And the bigger frequency labs didn’t have winners.


From. Meech’s Signal to noise ratio spreadsheet

The trend however is that the lower the frequency is - the shorter there is between the A’s - the higher entropy shown in Vienna's estimate. Although there are more winning designs when they were run for lab. Not all colors of base affect entropy equally much.

Perhaps someone remember these particular Intrinsical labs breaking several lab batches a while back. Now the labs results from these labs finally seems to make some good use. :) Beyond that we got the needed hard restraints for base repeats. ;)

When I ran some of the the Intrinsically Red lab designs - they have monster long red repeats, entropy goes super low along with ensemble diversity. This is not the case for Intrinsically blue, there entropy goes high. And in Intrinsically green, entropy goes super high.

Real long base repeats ranked after estimated entropy inducement:

High
Green
Blue

Low
Red
Yellow

So it seems that having repeat A’s and many of them are a way to ensure super low entropy = static structure. No wonder we had low entropy in the classic eterna labs, where many of us thought it bad to have much other than A bases in single base area. Vienna overused bases other than A and repeats in single base stretches, spread in an unbalanced way, which made this strategy look bad, and the lab method for obtaining data back then, generally showed A’s to beneficial in the loop area. Something that changed with Cloud lab.

While red also gives low entropy, red repeats can’t be used the same way as yellow repeats, as they cause trouble for the polymerase, when there is too many of them.

So as I like to say:

Basically RNA folding is a game of frequencies.

There is no one right answer
on how to fold RNA,
beside the questions,
what length,
what elements,
how many parts?

What color, base frequency and base repeats are needed really just depends on what structure you want to make, and what function you want the molecule to perform.

Repeat the same small sequence with a too high frequency and too close together, and have two such sequence frequency repeats that are complementary and misfolds are bound to happen. Thats unwanted in static designs, however we can turn it into our advantages for switches.

Stems take different base frequencies than loops, small designs takes different frequencies from big ones. Similar elements vary with size. Big loops take different base frequencies from small loops.

Balancing the repeats

For the riboswitches, the A repeats are lowered much compared to normal frequency in static labs. In static labs, Repeat A’s mostly are the most frequent one. In really big loops the repeat frequency is also changed. There the A repeats are most frequent too, just to a more extreme degree. And often with raised amount of U repeats too.

Rough guess for ranking on base repeats in different type of RNA designs



The different base repeats affects entropy differently. UU, GG and CC are not alike. If these are added in a loop, the one with the stronger pull (CC and then GG) have more power to disturb the fold of a nearby stem. Meaning they can make the stem go slightly or very unstable. Meaning that they actually help facilitate movement of that stem. The more pulling power and longer base repeat, the more potential disturbance. I think this is why U repeats are regularly longer than C repeats, as U repeats are less aggressive.

I basically think we can make better switches if we up the amount of repeats in them - that is, with the exception of A repeats. And keep these repeats in a fine ratio balance.

Why base repeat ratio matters

If static labs generally have a 30% repeat ratio and switches have a 40% ratio, it will make a difference for folding opportunities.

Not only do sequence repeats provoke higher entropy - which is normally bad for getting a solid fold.

Lets do a small example. Imagine you have two designs with a different amount of repeats.

Now imagine a riboswitch with 50% base repeats - as some of them has that many and a static design with just 30% repeats. Let's say both designs have 100 bases each.

Now the switch design will have 50 bases that are repeats and the static design will have 30 bases that are repeats.

Let's say the repeats on average are around 3 bases long.

Switch: 50 single bases + 17 repeats = 67 base regions

A raised amount of the repeats will have an option to pair with each other. But overall there are a lot fewer ways the RNA can fold.

Static design 70 single bases + 10 repeats = 80 base regions.
Now there are so few repeats that not all of them will pair with each other. However overall there is a huge opportunity for many different RNA folds, and some of those folds being real strong.

I think the repeats may find it a bit easier pairing with a repeat than pairing with single base regions.

In the single state design, the repeats are kept in control by A’s repeat that are less likely to pair with anything else, than other kinds of repeats. Repeat A’s actively lower entropy. The most interactive repeats as C and G are kept at a lower rate and generally the C’s are kept shorter than G’s. Also they are safer placed in stems as if in loops they will love to wreak havoc. Which is basically what Vienna generally did wrong in past classic eterna labs.

Higher entropy as switching force and C and G repeats as anchors

So I basically think that raised entropy is what what unleash the power for getting the switch moving, the repeats greatly helps limits the structure folding options and the G and C repeats are the skeletal in the switch mechanism. Or put a bit different:


  • Raised entropy leaves the RNA wiggle room to change shape.

  • General raised repeat limits the general pairing options and steers the switch towards the correct fold.

  • G and C repeats makes strong enough anchors for each different states, to make up for raised entropy. I’m imagining the G and C repeats as kind of Snap fasteners.

  • Perhaps the U repeats that are also often long and more present than usual in static labs, are helping the switch slide.



Things that could be interesting looking at


  • Percentage of repeat bases, versus single bases in switch puzzles. And then again the same for static puzzles.

  • Percentage of repeat U’s against repeat A’s, percentage of G repeats versus C repeats. Plus a combo of those two groups.

  • Average length of repeats according to base.

  • Optimal distance of base repeats. They seems to be spread without the design with no big gaps.


Perspective and thoughts for the future

When I run switch winners through Vienna and watch positional entropy, their entropy is often in around a 1-2 range, which is far outside of what is normal range for a static design. Usually static designs range somewhere between 0.2 and 0.9.

Now Vienna is only a tool and as we well know far from perfect. I wouldn’t trust it to do my static lab designing. However it may still be useful for pointers of some of the really bad designs - as it was in the past.

I basically think we can use entropy as discriminator if a design is really switches. If you have designed a switch and it scores below 0.9, then it is likely not a switch at all. If it has entropy above 2.5 it is likely not switching the way you want it. And if it lands in a good switching range, you have absolutely no guarantee that your switch will work the way you intended it. You only know that there is a good chance that it will actually switch.

I even think we can use the colors of entropy to help determine if a switch is potential happening in the area we want it most. These entropy range numbers are just a rough estimate, watch out for what you think of as optimal.

I have been swearing off outside tools for a real long time. But I think Vienna is my new best (old) friend. :)

So advice for future switch labs - where there already is a few winners. Try send them through Vienna to get an idea of what entropy range may be smart aiming for - to repeat the success.

Thx to Machinelves for input and discussion.
Photo of Eli Fisker

Eli Fisker

  • 2328 Posts
  • 541 Reply Likes
Part II - ROBOTS, REPEATS & ENTROPY

The Return of Vienna

Nando’s bot ViennaUTC (Vinnie among friends) has done particularly well in the recent switch labs. And I noticed it that it had a general higher frequency of repeat bases than most of us players. This reminds me of classic eterna labs. One of the big problems for ViennaRNA (Scientist algorithm - ViennaUTC is the later modified version by Nando) in the classic eterna single state labs, was its too high ratio of base repeats and placing many of them in single base area. However for switches this seems advantageous. Just search for ViennaRNA to see its designs.

Classic eterna designs

I have been complaining about Vienna's uneven energy distribution in the static labs - which it did have. It really looks like I should also have also been complaining about its high and uneven entropy distribution. :)

Entropy and even energy distribution

Something else I have been looking out for is if even energy distribution still held for switches. I had speculated that the switching areas had different energy distribution compared non switching areas in a switch. But I haven’t been able to determine big differences, as I would have expected. So it seems that what we should be for the lookout for instead is uneven entropy distribution. For a partial moving switch, the static area will typically have low entropy (signifying the structure is stable and nonmoving) whereas the switching area will have higher entropy. (Signifying less stability and likely movement.)

Often longer switching stems can be seen as having entropy instabilities at the ends of the stems. So it seems it might be worth aiming for destabilization of that part of a switching structure, to help the switch on its way.

So to sum up:

One can solve a static design with even energy distribution and low entropy. But not with wildly uneven energy distribution and high entropy.

One can solve a switch with even energy distribution, uneven entropy and higher entropy

How to get Vienna to show entropy

I then ran the highest scoring switch through Vienna, just like it was a single state design. (Here is a demonstration of how to do that: Quick guide to Vienna RNA fold. Plus for those interested some very basic introduction to entropy.

Wonders of Entropy & RNA - Part 1
Wonders of Entropy & RNA - Part 2

I have read that Carna is good with switches, but I’m not sure how to use it, as I can not get it to eat the fasta formating that I give it.

Quick Intro to Fasta

So anyone who can share how to shall be most welcome. Also I think this might benefit our general designing. I suspect this may be a way to discriminate between unlikely switch winners and potential good designs beforehand. Just like Vienna often can tell on a really bad static design (unless it is a Christmas tree style one).