Note on the Accuracy of Eterna algorithm in the real world

  • 3
  • Article
  • Updated 9 years ago
I'm sharing this figure just to prepare any future lab-designers that they should EXPECT the rules for real-world RNA folding to differ substantially from winning strategies in the puzzles. Below is a summary of the accuracy of state-of-the-art RNA base-pairing prediction algorithms on an array of experimentally verified RNAs.



You can see it varies from 60-90%, depending on the example. Therefore, lab designs are not just about crossing t's and dotting i's!!! There are alot that the game algorithm flat out cannot capture at all.

However, the chances that most parts of the right answer were contained somewhere in the top 750 suboptimal structures, however, is nearly 100%.

If you want to learn more about sub-optimal folds, I have a rough attempt at an explanation
here.

Note this slide was extracted from slide 25 of
these class notes: and the original citation is:

http://www.ncbi.nlm.nih.gov/pubmed/10...
Photo of alan.robot

alan.robot

  • 91 Posts
  • 36 Reply Likes
  • piqued

Posted 9 years ago

  • 3
Photo of alan.robot

alan.robot

  • 91 Posts
  • 36 Reply Likes
Note: In the above post, I am referring to the accuracy of the minimum free energy structure which is what ETERNA shows in "natural mode". Eterna does include some consideration of suboptimal folds in the "dot plot" view, but that is only one of several existing methods to calculate suboptimal folds.

For more details about how the prediction algorithm works, as well as explanations of several different methods of calculating suboptimal structures, see the following reference: doi:10.1016/j.jmb.2006.01.067

http://novacripta.cbm.uam.es/bioweb/c...
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
Thanks for these info links, alan.
I also am suboptimal;
Photo of JRStern

JRStern

  • 42 Posts
  • 2 Reply Likes
so, let's see if I follow, apparently nature has some further metrics for selection besides optimal energy (etc) according to these models? perhaps there are some forbidden designs we don't know about? perhaps there are some further optimization calculations we don't know about?

i've wondered if there are some "terrain" factors, that a somewhat suboptimal design may be locked in and for good reason, if it is well isolated from other solutions, stuff like that.
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
This is kinda off-topic, but I'm curious about pseudoknots.

I'm pretty sure I understand what they are: if you have an RNA with segments A, B, C, D spaced over the sequence and A bonds to C and B bonds to D rather than A to B and C to D or A to D and B to C (in other words, not producing nicely nested dot-bracket thingies).

And as I understand it, most algorithms used to predict secondary structure can't cope with them at all.

What I'm wondering about is how prevalent they are (I see in the above graphic that they range from 0% to 14% in those specific examples - is that a percent of base pairs, or the percent of nucleotides participating in pseudoknots out of all nucleotides, or something else I'm not thinking of? In the molecules with high levels of pseudoknots is it that there are a lot of little ones or just one or two big long one? Is that a representative sample?).

Do they mostly occur in RNA with a lot or a little base-paired content vs. loop content? Does length of the RNA matter? Do they mostly occur within a small area of an RNA or across large distances?

And perhaps most relevant to EteRNA specifically, are they something to be thinking about in lab design? Since the algorithms won't recognize the possibility and report it on a dot plot or in the RNAfold results, is that something we should be keeping an eye on?
Photo of alan.robot

alan.robot

  • 91 Posts
  • 36 Reply Likes
Here's a link to a database of known pseudoknots: most seem to occur in viruses but the also occur in larger RNAs such as the ribosome as well.

http://www.ekevanbatenburg.nl/PKBASE/...

Also, here's a neat article that shows what they look like in 3d.
http://www.plosbiology.org/article/in...

I think it's worth keeping in mind as they certainly could arise any time you have unpaired regions in big loops and such, but it would be really difficult to detect even when they do occur since SHAPE assay doesn't show what is bonded to what, only whether they are bonded or not .. .
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
Thanks, alan.robot!