RNAfold

  • 7
  • Article
  • Updated 9 years ago
Recently we've been able to make sense of the RNAfold webserver, thanks hugely to alan.robot. Who has written a basic dictionary to the terms used on that site. Available via his profile. Link to his player profile provided below.

http://eterna.cmu.edu/htmls/player.ht...

Now applying the RNAfold webserver, I come up with the following results for the last shape we had which was the One Bulge Cross.

Synthesized Results.


Based on these results, I think there were maybe 3 solutions that were being measured here, with varying rates of success. Now it is important to note that the scores from RNAfold will not neccesarily match nature in reality. But I thought people would be interested in having another possible key measure or two.

So far I'm mostly using Ensemble Diversity and Entropy. (Which is the max value in the graph at the bottom of the screen, of RNAfold.)

Most Voted and Interesting lab submissions for this round so far. (Ensemble Diversity desc)


Please use these tables as a mere guideline. I am not saying the first result will score the highest. Just letting you guys have as much access to the data as easily as possible, so then you can make up your own mind.
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes

Posted 9 years ago

  • 7
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
And I have to add for the second table, because evaluating is still a manual process, I only chose those who were either top ranked or most voted, to compare against.

I would love to compare Each and every submission, but I don't have that much time to accomplish that, at the moment.
Photo of Chris Cunningham [ccccc]

Chris Cunningham [ccccc]

  • 97 Posts
  • 13 Reply Likes
Here's a topic that tries to compare "Each and every submission" .... well, at least of the ones we've already synthesized. Maybe people in this thread would find it interesting?

http://getsatisfaction.com/eternagame...
Photo of Chris Cunningham [ccccc]

Chris Cunningham [ccccc]

  • 97 Posts
  • 13 Reply Likes
Excellent post! I'm glad to see that the RNAfold numbers don't clearly translate into synthesis scores -- I was worried that maybe we as the players weren't necessary at all!

The fact that "43 - even better than..." has horrible scores on RNAfold means [to me] that we shouldn't rely on this tool very much.

Edit: but I am still excited to find good uses for it!
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
Something that should be interesting to watch as the results for deep_thought's newer designs based on "43 - even better than 42" come in :)

Here's some links for RNAfold results on the series (note that I don't know how long they keep these on the server, so they'll probably go dead at some point):

43 - even better than 42"
44 - a new hope
45 - the RNA folds back

Where "43" has a projected MFE frequency of only 6.18%, "44" is at 71.23% and "45" is at 96.06%.

Similarly, ensemble diversity (which should be low, it's the average number of nucleotides that differ between formations) in "43" is 7.46, in "44" it's only 0.87, and in "45" it's 0.09.

Finally, the positional entropy ranges are 1.2 for "43", 0.6 for "44", and only 0.2 for "45".

Something else to note about this series: if you compare the graphical output for the MFE and centroid formations even in "43", they look identical. As I understand it, that means that most of the variations projected in a sample are basically the same shape, it's just a matter of some bonds forming vs not forming, rather than bonds forming between separate regions of the molecule. This could help explain why the lab results were pretty good for "43" despite RNAfold's predictions.
Photo of Chris Cunningham [ccccc]

Chris Cunningham [ccccc]

  • 97 Posts
  • 13 Reply Likes
neat ... I'm excited about this.
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
I LOVE the idea, and respect the work put into it, but I'm concerned that the entropy numbers do not reflect the entropy graph or color plot very well - for example, I cannot find where my entry has the 0.50 reading you have assigned to it. Also, my entry has one of the lowest entropy graphs of all, and it has virtually none of the negative "cool" colors in the positional entropy color plot (a Very Good Thing), yet in your spreadsheet, these strengths are not apparent at all.

Also, I think that when you publish a sheet like this, it is imperative to at least also post an interpretation guide, so people will know which figures SHOULD be low instead of high, or many may come away with wrong impressions of strengths and weaknesses of these designs.

It should also be made clear that many designs with very high MFE Structure Frequency percentages, have awful looking Positional Entropy Color Plots, and some designs with less high MFE percentages have extremely favorable Positional Entropy Color Plots. - Since these color plots are not available here, and not all players who read this will be inclined to go look at them, I fear that many may judge by MFE Frequency alone - probably not a good thing to do

Perhaps, since you are going to the effort to gather all this data and publish it, maybe you could also include links to all the results themselves, so people can easily go get the whole picture.

So - Great, Great Work!!!, BUT,PLEASE be sure to post interpretive guidelines, and make sure these numbers are accurate and reflect the whole picture. :)

Thanks, and Best Regards,

-d9
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
In defense of Berex NZ, he provides a link to alan.robot's profile, which has a lot of the explanations you request -- it probably would have been overkill to repeat it all here.

I ran your design (PentaPuppy2) through RNAfold, the results can be found here, though I don't know how long that link will be good for.

The 0.5 entropy figure can be found either in the graphic output when looking at it with Positional Entropy display option checked or as the y-axis of the Position vs Entropy graph. In both cases, it's the top end of the range.

To be fair though, the structure only reaches the top of the range at the two nucleotides in the locked portion of the structure that many if not most designs so far have trouble with in RNAfold (the G at 1 and C at 93 in One Bulge Cross or 111 in Star).

As for the comment about designs with high MFE Structure Frequency percentages having awful looking Positional Entropy Color Plots, I've found that a lot of that is an artifact of having different ranges assigned to the colors, where dark blue is the highest for that particular design, rather than a constant. That means that there's a lot more detail in the graphical output for designs with very small ranges of entropy.

In sum, I agree with some of your concerns -- RNAfold can be a huge help to us, but we need to make sure we understand what we're looking at, and remember that fundamentally the reason that we have lab at all is that projections like it can give us don't always match what actually happens. But I hugely appreciate the work both Berex NZ and alan.robot have put into helping us get to that point.
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
First, I feel terrible that anyone felt they had to "defend" Berex -(as this implies a perception of an "attack") I can assure everyone that I did not intend any of my comments as unkind criticisms (they were intended only to be constructive - and most certainly not as an attack). My intent was only to respectfully convey my concerns - in everyone's best interest. So I feel quite saddened that I may have come across as if I was being too critical, or unkind in any way - If I did, I offer most humble apologies to both Berex and Ding, and anyone else who may have thought that my raising of concerns was in any way malicious or thoughtless, or otherwise offensive..

I wanted to add a suggestion that might help to make the "entropy" figure more clear, since the scale in question seems to be a "relative scale, where the 2 bases in that top section of locked structure appear at an even higher number when the remainder of the graph is lower and flatter (which is a very positive feature) - (If you look at several of these you'll begin to notice the ones with the lower, flatter (better) graphs - invariably have Higher Scale numbers , hence making it seem (to me) to be a hugely misleading figure to use. (I theorize that this may be because on very, low, flat entropy graphs, those spikes in the locked portion, ARE higher, but only relatively.).

In fact, if this suspicion proves to be true, then the scale reading would actually have to be interpreted in the inverse, as it would then prove to be an indicator not of a high overall entropy, but rather of a greater distance between the locked spikes at either end, and the general level of the remainder of the majority of the graph. - And therefore indicate a better, lower entropy by the higher scale number.

One suggestion to replace this "scale" figure might be to use instead a manual count of the number of all bases which show as "cool" colors in the "positional entropy" view - the lower the number of "cool" color bases, the better.

Perhaps others may have even better ideas or suggestions on that count.

Also, I fear that having the list sorted by the Frequency of MFE percentage could be criticized as showing a bias toward those in the high range of that only partially evaluative figure. Many, especially those not familiar yet with Alan.robot's wonderful explanations - may see this table, and just go vote for the top percentage of "Frequency of MFE"" designs, and not even consider the whole entropy portion, or not consider it correctly. Therefore, I think at the very least, although it is obviously an innocent "convenience" sort, I think the list should be alphabetical to avoid any misunderstanding, or appearance of bias, albeit unintended.

In support of these thoughts,in alan's introduction to his latest informative paper "A meta-analysis of one-cross-bulge results
I: positional entropy and what it means:...he said in the introduction:

"In this tutorial, I’ll show you an example of how positional entropy can be used to help predict winning and losing designs, even before you submit!"

Therefore, since the author's stated focus was on Positional Entropy, I am concerned that one unintentional result of Berex's otherwise Fine and Commendable Effort - was that it appears to emphasize the importance of "Frequency of MFE" over Positional Entropy, and falls short in adequately conveying the importance of alan's stated focus on "Positional Entropy."

Also, using the scale number as a measure of the design, seems (to me) to be inappropriate, as it appears, as mentioned above that this scale is not fixed across designs - and to associate the value of design with what appears to be a variable sliding scale figure, may do many authors and their designs a disservice - by making them appear to be less well constructed than they actually are; when in fact, the Positional Entropy Color Plots seem to indicate just the opposite.

In closing, let me say that I do realize, that my perceptions may be in error, and that my concerns may be perhaps ungrounded.

So I also counsel and request that we wait until alan can weigh in on this question, so we can get a more authoritative take on these thoughts - before grappling too much further in pursuit of a conclusion for which we are not even yet sure we have enough adequate correct information to base any conclusion upon.

Again, I do sincerely apologize for any poorly phrased or worded thoughts in any of my communications that may have seemed unkindly critical, or that might have come across as offensive in any way.

Respectfully,

-d9
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
I'm sorry if what I said caused you regret. I probably overreacted a bit (I sometimes do). No harm no foul?

We're all still trying to get used to reading this output, and still have very little data to help us figure out which numbers will be the "best" predictors of actual success in the lab. Honestly, I think we should all be looking at all of it, rather than any one metric (and I understand your concern that people will pay attention only to the MFE frequency %, especially since you've lived through more rounds of "lowest free energy is best" than I have).

Counting colors of nucleotides in the graphical output is flawed though, since the colors change based on the range. Similarly, the overall "bumpiness" of the position vs entropy graph increases as the range of entropy decreases -- making a "bumpy" graph not necessarily bad, provided the range is small.

Perhaps the best single metric I can think of is to look at the maximum entropy in the position vs entropy graph not including the locked portions.

By that metric, PentaPuppy2 tops out at about 0.08 rather than 0.5, and Berex Star Two tops out at about 0.02 instead of 0.06.

Of course, looking at that for a whole bunch of entries is a lot more labor-intensive than just checking the overall range including the locked area.
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
Heyyy Ding, - Having inadvertently stepped on some other toes recently, I am also perhaps a bit over-reactive. so yes, no harm, no foul.... we're good :)
...and yes, I think those figures should be re-cast as you described. It would be the only fair thing to do. Then re-post corrected version so voters can see accurate entropy - could make big difference if people base votes on it.
Photo of Madde

Madde

  • 45 Posts
  • 17 Reply Likes
Maybe this is a good place for everyone to post the RNAfold results of their lab submissions?

Alpha Centauri
Beta Centauri

It looks like Alpha Centauri won't fold properly, so please don't waste your votes on this one.
Beta Centauri looks way better. :)
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
Hi madde,

We have yet to prove a definite link between this RNA Server's results and Lab Results, so it may not be a certain failure. I counsel patience and vigilance on first results before undercutting your design so - remember, many designs that look perfect in EteRNA have done poorly in synthesis, and I know from personal experience that several designs that I initially dismissed ended up being top scorers. Still, however, I do commend your modesty and selfless concern for others' votes - so much that you will warn people off your own design. (Now I want to vote for it just to see what it really does do!)

One CRUCIAL NOTE: Please click the box "Positional Entropy" before linking your designs, the buttons do not seem to work afterward, and this is where the critical information is revealed - in the Positional Entropy Colors!

(Also, I can only see these colors in Chrome; not FireFox. Firefox only shows black & white letters, not very useful. Does anyone know if this is a plug-in issue?)
Photo of Madde

Madde

  • 45 Posts
  • 17 Reply Likes
Hmm, I did click on "Positional Entropy" before linking (the URL doesn't change, though) and the button still works for me (Chrome 9)
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
Wow ok.
Uhm, you are absolutely right d9 that these tables could be misleading and they were never meant to be an authorative source, hence why there is no interpretation guide.
I'm certainly not going to say low is best, when that is why people create christmas trees to have to lowest MFE. Even though I deem that strategy to be flawed. I left it up to their own interpretation and hopefully to jumpstart their curiosity and strive to learn more via the link provided.
I agree max entropy isn't perfect, but I figured it to be more useful. When there is a more useful and accurate way to calculate average entropy, I'll most probably throw that in too.

You are absolutely welcome to write up your own similar table with figures you deem to be more relevant. I just wanted to share this with everyone and hopefully open a few more eyes. :)

PS. If you look carefully, I never sorted by MFE. First table was sorted by Synthesis scores descending and the second table was with Ensemble Diversity ascending.
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
I'll post my spreadsheet of results of running some of the current lab designs through RNAfold to supplement.

I chose the designs based on how clean their pairing probability graphs looked to me (the new lab tool you get when you click on the eye in the tool menu). That's a decidedly subjective measure. I looked through all the designs that had been submitted as of this afternoon (anything with an ID number below 327744), made a list of which probability graphs seemed cleanest, and ran them all through the server.

I've included columns for the Minimal Energy Formation frequency %, the Ensemble Diversity, the overall range of Positional Entropy as given in the position vs entropy graph at the bottom of each result page, and my eyeball estimate of the highest entropy point not including the nucleotides that are locked.

Mandatory disclaimer: this is for informational purposes only. I make no claim that any of these would outperform any other design in there. I have included a couple of very-high-GC-content designs that in my opinion would most likely fail to synthesize at all, just because they had pretty dot graphs. If we want to test the upper limits of GC content for this shape, these might be good ones to test with :)

Also, the order is strictly according to ID number, from first-submitted to most recent submission. It makes it a little harder to compare stuff, but should avoid any appearance of preference for one metric over another.



Also, a note: I was going to include URLs for the results page for each, but learned that links would only be good for up to 48 hours and decided against it. The results should be reproducible by copying the sequence for each design (which we can now do easily, yay!) and entering it into the RNAfold Server. Under "Advanced Options" make sure that "unpaired bases can participate in at most one dangling end" and "RNA parameters (Turner model, 1999)" are selected.
Photo of alan.robot

alan.robot

  • 91 Posts
  • 36 Reply Likes
Hey guys - I wish I was a bioinformaticist so I could say something more useful to you, other than I think you are waaaay overanalysing. . . Great to see the enthusiasm, and super great that you've proved to yourselves that the algorithm doesn't know everything.

I posted the tutorial just to get people *thinking* about entropy instead of just the single lowest MFE that eterna shows. Because down that path lies dragons, and possibly christmas trees.

The entropy is actually almost impossible to calculate accurately. So although this output talks about an "ensemble", it uses a really crude approximation instead of explicitly computing the millions/billions of suboptimal folds that are possible which would be the correct way to do it. So the MFE frequency has to be taken with a whole bag (not a grain) of salt because the computer is super-guessing here, and that's even assuming the energy function was perfect which we know it's not. . .

You CAN do that for yourself, however (the suboptimal fold calculation), it's linked on my new tutorial on my profile. It's not for the faint of heart, however, and it takes *forever* compared to RNAfold, so no global analysis of all entries will be feasible. Easier to just think about it in your head as you design and try to figure out why designs went wrong, the general principles apply to many of the tips ccccc has on his profile, for example. . .

That the RNAfold algorithm can sometimes tell between an awful design and a decent one "up front" is the only real firm point I wanted to make, and even in that case it's just if your free energy is way too high - it can't warn you at all if it's too low, that's a kinetics problem.

If you were hoping for a divining rod for winning entries this is most definitely not it, it'll probably end up being no more or less useful than MFE and Tm were to begin with, which is to say, not very but more tools can't hurt.

Cheers and keep up the good work,

Alan
Photo of alan.robot

alan.robot

  • 91 Posts
  • 36 Reply Likes
I was super intrigued by the apparent contradiction that "43 - even better than.." has a very high entropy but scores well. Ding's hypothesis that the misfolds, on average, still keep the right overall features correctly was too good not to inspect further, because if true, this is an excellent example of using entropy TO YOUR ADVANTAGE in an RNA design.

So I did a full subopt analysis. My parameters were:

RNAsubopt -d1 -noLP -e 15 < 43.seq | sort -nk 2 > 43sorted.out
barriers --rates -G RNA-noLP --max 50 --minh 0.4 43barriers.out

FYI this took 10 minutes and a gigabyte of ram & harddisk so not something you want to do on a web server.

I then visualized the lowest-scoring misfolds using VARNA
http://varna.lri.fr/demo.html

And it pretty unequivocally shows Ding's hypothesis is supported by the detailed suboptimal fold calculation. The MFE is -25.1 kcals/mol so these folds are all less than 0.8 kcal/mol different from the MFE, and therefore highly likely to be populated at equilibrium.

Note how the presence of individual (or even multiple) bulges doesn't compromise any of the rest of the structure from being correctly formed. The experiment will read out an average of everything in the test tube, so this is a good example of entropic stabilization of a design by using just the right number of GC base pairs, in my opinion.




If you want to see more clusters, the full subopt output is here:
http://dl.dropbox.com/u/15086981/43ba...

which can be visualized with the VARNA applet.