Tell us about your EteRNA lab algorithms!

  • 5
  • Question
  • Updated 8 years ago
  • Answered
We have a very exciting announcement:

EteRNA players are beating state-of-the-art computer algorithms designing real-world RNAs. Congratulations to our amazing players!

We would like to publish this exciting result so that scientists can learn about what you are doing.

However, we just need ONE VERY IMPORTANT thing before we can publish. We want to fully describe what you have learned and what strategies you use in the
publication. What do you do to select good RNA designs in the lab? Do you have an algorithm, or do you have a set of things you look at?

We're looking forward to delivering great thoughts from the EteRNA community to the rest of the world!

[
Colored points indicate EteRNA players and gray points indicate computer algorithms. EteRNA players came up from the behind and won!


Raw SHAPE data of synthesized designs
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes
  • excited

Posted 8 years ago

  • 5
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
This is amazing news. You ask for our strategy on how to select good designs. Here is what I do.

I look for colorpatterns. In designs that synthesis results says is good, but I also learned from the unlucky designs. (My own and others :) )

When I check a design in RNA Fold, I check under the option positional entropy. I compare the colorpattern in the eternadesign against the colored warningpoints of high entropy (bad if to high) in RNA Fold.

With colorpatterns I mean:
Ex. 2 AU basepairs next to each other, especially if turned the same way, might be a problem. Seems to change a bit from puzzle to puzzle, how sensitive the structure is to this, some of the puzzles seem to allow it. But 3 or 4 AU pairs in a row and you are usually in trouble. Meaning the structure gets unstable there. You might get away with tree, if turned in different direction.

A GU pair, if not carefully placed have a tendency to break up the area nearby. Haven't broken this mystery yet.

There are more of those colourpatterns to be aware of, but these are the ones I remember. It's not fail-safe, but it can make you avoid using a pattern in your design, that are very likely to fail.
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
And a simple one - always place a GC pair to close the loops, that also applies to the internal loops.

There should be a GC pair in the bottom and the top of a "hairpin/stem/helic or what it is called. As far as I remember I haven't seen a succesfull design that didn't follow that "rule".

One to teach the robots :)
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes
This is really interesting Eli! Just curious - do you have a way to use stack/loop color patterns from past RNA lab data as well?
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
No, stacks I havn't looked much into. The same with loop color patterns. Just checking now. There were a few loop color patterns, that seemed to be succesful, the classic GAAA, also saw some with 2 AA's and 2G's that were working (GAGA), and of course Mats all 4 A's. And I know there is more from a list over typical and in nature working loop patterns. (For those who haven't seen it: http://getsatisfaction.com/eternagame...)

There was also a few very colorful variations that seem to work. Here are the loop patterns among the winners (Sneh had succes with AUAG in the finger. Dimension9 introduced AAGA in the One Bulge Cross, where Berex used UGAA.

In the star, Donald got away with even 4 different and mostly new loops in the same design (GGAA, GAGA, GCAA, GUGA), Deep Thought used two loops of UUCG,

What pattern I see from the loop color patterns of the winners, is there tend to be an "overweight" with loops in especially yellow followed by red (might come down to player preferrence.) Four A's, Tree A's and one G and also different combinations of two A's and two G's.

I think people tend to get their own favorite. Also it is easier to use and vote on what you know is safe. GAAA :)

I think moving succesful loop patterns from old designs to new ones could work very well. Quite a few of the working loop color patterns returned in some of the later and winning designpuzzles.
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
I think this thread could be a great resource for players starting out in lab also. One of the most common questions I see about lab is "how am I supposed to know what to vote on?"

The first thing I look at is bond content. I pretty much only look at designs in which between 30% and 80% of the bonds are GC, and no more than 10% to 20% are GU. Those are rough guidelines -- in a shape like the Cross or One Bulge Cross where most of the nucleotides are bonded (very little unbonded content) I want the GC content to be lower than in a shape like one of the Stars where there's a lot of unbonded content and a more complex design.

Next thing I look at is tetraloops. If the tetraloops aren't either AAAA or one of the known patterns that gets an energetic bonus in EteRNA I check to see if there are mispairing possibilities with a complementary sequence nearby (if for instance the four nucleotides in the tetraloop are AGUA and part of a nearby stack is UAC).

Then comes any multiloop. In the case of One Bulge Cross (the first lab I was around for) this would mean the center intersection. Fairly early on in the rounds it became clear that there were two central patterns that seemed to hold up better than others, so I looked for those. In the case of the more recent labs, it has meant making sure that any non-Adenine nucleotides in the unbonded parts of the multiloop don't look like they're going to interfere with the formation of that loop.

Next is the dotplot. I usually look at this as it is, then make any tetraloops in the design AAAA and take another look, then make them all GAAA and take a third look. This is harder to set rules for. I'm not necessarily looking for an absolutely perfect dotplot, but some things I like to see are an absence of what I think of as "shadow lines" running parallel to the lines we want to see (which suggest that two sections of the sequence may mispair rather than just a single bond), and not too much variance between the dotplot for the design as is, the design with AAAA loops, and the design with GAAA loops.

If the designer has given RNAfold stats I'll look at them, otherwise I rarely run sequences through RNAfold or other servers anymore. If I do, I'm looking for an MFE% of over 80% for designs with AAAA tetraloops or over 90% for designs with "stabilized" tetraloops, ensemble diversity under about 0.5, and entropy range under about 0.3.

Finally I look at colorpatterns as Eli Fisker has described, and quad energies. I don't like to see any quad energies over -0.9 kcal (a UA UA or AU AU quad). If there are GU bonds used, I like to see them stabilized on one side with a GC in a configuration that give -2.1 or -2.5 kcal. I also don't like more than 3 AU pairs in a row, even if alternating orientation (though if we ever have another shape with significantly longer stems this may change). I also look for GCs at the beginning and end of all stems, though I won't rule out a design that has one or two stems closed with AU especially in early rounds when I think we're still testing the tolerance of the shape.

In deciding whether to vote for a design or not, I base the decision on both the above considerations and what I know about other designs that have received votes. If for instance most of the designs being voted up are at the high end of what I consider "desirable" GC content, I'm more likely to vote for a lower-GC design (again, especially in rounds one and two of a new shape). In later rounds, if we already have four or five modifications of a successful shape voted up to the top, I'm more likely to vote for a new modification of a different design, or an entirely new design.

edit to add: in all the above I left out two important things that are harder to quantify. First: the "neck" of the design (the stack nearest the open loop). We're still having a lot of trouble forming this in all shapes, so I go by my most recent gut feeling about it. Second, the comment section. Especially for designs that are modifications of previous designs this is important -- I like to see what the designer is trying to correct and how.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 484 Reply Likes
Think you are right about a closing AU pair in the top of a stem. If the nucleotides are played wise, a few might be allowed.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 484 Reply Likes
A comment to Dings word: I also don't like more than 3 AU pairs in a row, even if alternating orientation (though if we ever have another shape with significantly longer stems this may change)

Yes, there seem to be a difference in what colorpatterns are allowed, just depending on the length of the string in the puzzle. The longer the string the less ruledependent/sensitive the design seems to be. (Or the rules for what is allowed changes)

In the finger, which is basically a long string with loops in it, Mpb21 made a 6 basepair long row with AU's in it, and it stayed perfectly together. (Good design)

In The cross which had long strings as well, Mat made a working row of 5 AU pairs, but the two with 6 AU pair broke apart. (Mat cross design V2)

But this would not have been possible in any of the later puzzles, with shorter strings and big inner loops.

Theory: The longer the string, the less sensitive to colorpatterns in general, like 2 sameturning AU's beside each other and GU's relatively close to each other.

In the finger designs you can see loops being held together by GU (Pure butter) and AU (DRV_a-Perception 10)
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 484 Reply Likes
One of the robots just managed to make a row of 5 AU pairs. (Nupack bot design 1, The branches, 80 % synthesis) But that doesn't mean that it usually will work.
Photo of jerryfu

jerryfu, Alum

  • 8 Posts
  • 0 Reply Likes
Hello Ding,

The dev team is trying to create a script out of your strategy. I want to clarify something before I finalize the script. In your original post, you wrote:

"If the tetraloops aren't either AAAA or one of the known patterns that gets an energetic bonus in EteRNA I check to see if there are mispairing possibilities with a complementary sequence nearby"

Could you tell me about the known patterns that get an energy bonus?

Thanks for your help!
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 484 Reply Likes
Comment to my comment above: One of the robots just managed to make a row of 5 AU pairs. (Nupack bot design 1, The branches, 80 % synthesis) But that doesn't mean that it usually will work.

Now I have a better idea about why this pattern is tolerated in the neck and why it is not recommendable elsewhere in the design.

See explanation here in What's so special about the neck?
Photo of Joshua Weitzman

Joshua Weitzman

  • 93 Posts
  • 0 Reply Likes
I saw a name that has won multiple time and everyone else has chosen and picked that design so i could get the points then I would design something smiler to other designs so I can get more points. You only five cycles of the game and your gonna publish why not get a more data. You don't even have asymmetrical patters. The lab is flawed there is only 50 people playing and the same people win regardless of the merits of the design. 100,000 is a joke you are not going to get 33,334 people designing RNA if they see the same people winning. As i have said before the lab is a popularity contest.
Photo of Joshua Weitzman

Joshua Weitzman

  • 93 Posts
  • 0 Reply Likes
It will be stagnant in it current configuration. For every 1 person you get to contribute to your lab you will need to have 413 people sign up. if you wanted 100 additional contributers to the lab you need 41300 signup 1.5 times your current enrollment. For the goal of 100,000 synthesize per mouth you need 3.2 mil new recruits thats large then the top podcast shows. They can spend hundreds of thousands on there infrastructure and bandwidth. your current business model is good for selling high end luxury cars. Not for developing a Human base multiprocessors computing system that can do better then current computers. http://youtu.be/hSVo4ejZ7rc Only fifty people contributing to the labs will not cut it. You will need thousands of people contributing to make your statement true. Right know you could say Eterna players show promise of beating state-of-the-art computer algorithms constantly. Your going to publish your statement have a rush of people come to signup and only harvest a handful of contributers to your lab in turn souring the experience of the people took the time to come to your game but did not contribute to the labs. But if you take the time and make some simple changes to your reward system that would make a larger % of people feel like they are contributing you could capture a larger % of people that would contrubute. I could not run a business on a small advertising budget if one in 413 persons that came in bought something. Your idea has good bones but it need to be worked out with criticism and critical thinking not glad handing. If it was a business i was running i would experiment with the rules to increase your capture % before publishing a one in a life time statement like you are plaining on doing.
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes
Photo of alan.robot

alan.robot

  • 91 Posts
  • 36 Reply Likes
@Jonathan: I think your comments are all spot-on with what devs and players alike are worrying about, so don't feel as if you are alone and unheard in these concerns. Bridging the puzzle->lab gap is probably the biggest and most significant challenge Eterna is going to have to address, both the devs and the top players are thinking really hard about this.

I, in particular, think that GU challenges have been sending advanced players all the wrong messages, as I can't think of any problem that would realistically be solved in such a way. And that has nothing to do with the Eterna model per se, but rather the game-mandated constraints on what a solution should look like, and I do think those could be made to be more realistic (for example, keeping GC/AU ratios sane).

It is impressive that you might mistake any of the top players to be trained biochemists, as far as I know none have any formal post-secondary exposure to biochemistry or molecular biology, it's all stuff they picked up along the way as they become obsessed about becoming better at the "lab" portion. There are some grad and undergrad students playing as well, but AFAIK none are in the top 100 or so players, and none of the players getting synthesized has any such training. So, if it's any consolation for the rough transition to lab designing, none of the others who successfully bridged the gap have had any special training other than being meticulous, observant, and creative.
Photo of Jonathan Hall

Jonathan Hall

  • 2 Posts
  • 0 Reply Likes
@Alan (and Berex):

My impressions about biochemists involved with the lab were just general impressions from snips of chat and comments. Someone asked about synthesizing RNA elsewhere could be added to the lab database; people pass around articles and RNA analysis websites like they're common knowledge. The few players' backgrounds I do know are not biochemistry, but my concern is the impression a lab newcomer has.

My suggestion is that we maintain an up-to-date basic lab guide. It would include:
--basic terminology
--basic goals (is 90% a success?)
--instructions on how to interpret lab results
--basic principles learned so far
--a few examples with explanations about why certain pairings failed
--a quick rundown of the conventional wisdom
--links to helpful spreadsheets, conversations, and websites

This would have made me feel a lot better jumping into the lab. I keep saying to myself, "When I have the time, I'm going to go through all the lab results (whatever we're up to now) and analyze what works and what doesn't." Of course, the more results come in, the more daunting such a task is. I'd be happy to work on such a guide, but I can't promise it would happen anytime soon.

As for the puzzle play vs. lab design gap, I think it would be ideal to have a percentage score for the puzzles (i.e., your design has an n% chance of folding correctly). Different difficulty levels would have a different target percentage. Of course, this would require a more advanced algorithm, since currently a mostly GC puzzle seems to be the strongest.

I do think the GU challenges help identify the weakest spots in a structure, even if that is their only practical merit. I definitely think that competitions in the puzzle realm (both the GU sort and the designing a puzzle that few can solve sort) are important in attracting and retaining puzzle-solvers who like a good challenge. I'm only guessing, but I would think Joshua is still active on Eterna only because he had fun doing all the GU challenges and now designing difficult player puzzles.

(Sorry for the mishmash of comments and ideas, especially as I notice that I am not telling anything about my Eterna lab algorithms....)
Photo of alan.robot

alan.robot

  • 91 Posts
  • 36 Reply Likes
Great comments Johnathan! Hopefully others are taking note. .

I remember being particularly underwhelmed when, upon reaching 10k points, the only instructions for jumping into "lab" were the two youtube videos, and links to very old forum posts.

I had expected some sort of communal "lab notebook", like a wiki where anyone with lab access could edit (note the requirement of having an account with 10K points to edit pretty much eliminates random vandalism a la wikipedia- any player should be able to view content of course).

I think there are many players who would pitch in to organize and fill such a wiki with useful guides and content, but right now we are pretty much restricted to getsat posts which really isn't amenable to this sort of endeavor.
Photo of chaendryn

chaendryn

  • 29 Posts
  • 1 Reply Like
Wow, Ding ... awesome post :)

I follow a similar strategy to Ding. An additional step I use is to run some sequences through the Barrier server to see whether there are any obvious kinetic traps that would cause a suboptimal shape to dominate - http://rna.tbi.univie.ac.at/cgi-bin/b...

On designing, if the barrier results on design that synthesized above 80 in early rounds and 90 in later rounds show a problem area, I'd modify it and run it through again to see whether it potentially clears the problem area.

Something else that tends to influence my decisions on voting for a design is whether the person has bothered to explain their thinking. Perhaps they're testing a theory or have a reason for modifying an already synthesized design. I'd be more inclined to vote for one of those designs above a 'no comment' design or one that doesn't give any insight into what the person is trying to achieve by submitting their design.

Will spend a bit more time thinking about it and see whether there's anything else I can add :)
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Another thing. There tend to be problems in the bigger (inner) loops, if there are other colors than yellow, besides the area where the loop is connected to a stem. In the challenge puzzles it was sometimes a neccessary strategy to place extra nucleutides in the ring of the loop to stabilize the structure. But in lab it often makes the RNA structure fold in wrong places. It messes up the dot plot a bit as well.

The more othercolored nucleotides, the worse. A few may be okay. But generally the winning designs tend to avoid them.
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Anyway Dermochelys 95% synthesis design got away with placing one green besides each stem in the inner loop. (assymetries GC bounds in branches, The star) But the usual scenario is that more often than not, there is a penalty for placing things besides the stems. There might be a few golden combinations.
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
Something I noticed in the Star designs: it seemed to be pretty much okay to place a G or a C just before a stack in the central multiloop (so the segment was AAAC or AAAG), whereas putting them just after a stack (GAAA or CAAA) seemed to cause problems.
Photo of blubblub

blubblub

  • 60 Posts
  • 1 Reply Like
We have been testing our color template with each eteRNA RNA synthesized puzzle challenge winner and have come to the conclusion that it may be 'limiting' to assign only one color to each nucleotide: blue for uracil; green for cytosine; yellow for adenine; and red for guanine. There may be a higher color template that may help identify nesting patterns within the 'good' and 'problem' folding patterns to achieve 100% RNA synthesis. This may be more important as more complex challenges are released.

To quote Tevya in 'Fiddler On The Roof' -- "sounds crazy, no?!"

This the 'boohoo' part. In order to actually test this idea the DEVS would have to produce an additional color template that allowed someone like us (blubblub) to assign any of 8 colors to the four nucleotide colors you have now. The colors would be black, white, blue, yellow, green, red, purple and orange.

In other words -- the eteRNA folding and energy template and strategies underneath the current design still operate unchanged. However, players like Eli Fisker who use color to drive their intuition might find the ability to identify or discover new nesting patterns inside an existing design pattern useful.

Among other benefits (if it works) is nesting patterns could signal chemical and folding preferences within designs that may not be obvious right now. (If it works) the designs would be more articulate. (If it works) it would make any successful RNA design predictive.

One last comment. Marie Curie melted down tons of pitchblende to reveal the uranium that her and Pierre could hear with their Geiger Counter. In the end, she only had a spoonful of uranium for her efforts but it completely changed the world.

This collaboration to unravel the complexity of RNA is unparalleled. It is the stuff of a new paradigm. ENJOY!
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
blubblub, what is your background in this area? You seem to know a lot...
Photo of blubblub

blubblub

  • 60 Posts
  • 1 Reply Like
@Berex -- Thankyou for asking...my background, training and Industry Consultant Degree is in Information Flow Management. I managed the State of Wisconsin and University System Account for AT&T Communications in the 80's. During this period, I was lucky enough to personally meet with Alan Huang before he became the head of Optical Computing at Bell Labs. He told me "all information is rooted in light". I had already been studying Base-2 as the mathematical language of information since 1972, and I have been looking for the path to light ever since. It is my passion. My slide show and color template can be found at: https://picasaweb.google.com/collinsm...#
Photo of duanev

duanev

  • 5 Posts
  • 0 Reply Likes
I was thinking I could probably create a program that solves the puzzles the way I do - do y'all have a simple way to represent the challenges in a data file? And a verification program (that declares a puzzle solved, and computes the total kcal for both incomplete and completed puzzles)? If there is interest in more grey-dot generators that can compete with players >:), post the above and some more bot code might be forthcoming...
Photo of JRStern

JRStern

  • 42 Posts
  • 2 Reply Likes
Well, we cheat, of course!

The computer algorithms take a single shot.

The humans try stuff, get feedback, and then evolve towards the higher scores.

Frankly, I'm underwhelmed by it all.

I think the human results are simple mathematical outcomes of the game theory roles of the humans versus the algorithms. Wrap the algorithms in a multi-pass structure, even a dumb one, and you'll equalize the outcomes.

As for the voting, I'm with JW, the way to score is to pile onto the popular ones. The actual differences between the models is miniscule, and I have zero reason to believe that the best candidates are being selected for synthesis. Well, that's a little too harsh. More accurately, you could take any of those that score points for being similar and probably find several that are superior to the one that was synthesized. This might actual merit doing. If the ones selected even rank regularly in the top third, you could call that a success, but I wonder if they would.
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes
JRStern,

First, I wouldn't say evolving via trial-and-error is cheating. That is a fundamentally why people are better than computers at certain stuff and why we believe in human computation.

Second, if you look at the results, people beat computer from the round1 of "Bulged Star." And in the new shape "The Branches" this is true as well. This clearly shows that players not just blindly submitting designs - they have ideas on why some work and why some does not and we want to figure that out.

Wrapping algorithms in a multi pass structure - this is precisely what most machine learning researchers in the world dream to do. There is no easy way of generally modifying existing algorithms for multi-pass trial and error. Existing RNA design algorithms were fundamentally NOT designed to take in priors, and it's impossible for them to take multi pass unless they are redesigned from the scratch.

As I pointed out to JW, popularity is not blindly based on how much people like specific players. Popular players earned their fame by past performances and their ability to write - which is essential in research. Indeed, we have no guarantee that best candidates are being selected. But I also have no reasons to believe candidates chosen by players would be worse than random pick.
Photo of Peter Stampfli

Peter Stampfli

  • 1 Post
  • 0 Reply Likes
I like to do the puzzles and to make lab designs. However, I do it intuitively and have no rules, only experience.
Lab designs I do from zero, often I begin with all AU bonds and change things... ending up with designs similar to Mats or Dings or others ...
this finally gives me some lab reward points.

For this reason I have no idea how to vote. I simply vote for my own design to get some more points.

I participate because for me it is amusing and I have no particular ambitions.
Photo of penguian

penguian

  • 7 Posts
  • 2 Reply Likes
I have a few meta-strategies (i.e. cheating) to make my designs more likely to be voted up:
(1) Get in early in the first round. In the first round of a new design, I get in very early, and try to use standard puzzle algorithms (AU on stacks, GC on corners, GU for variety, one G somewhere in short loops, etc.) I then try to increase energy by flipping, then change more GC to GU to see if still stable. At this point I start checking the dot plot, and flip and change until I can make it as clean as I can. Next I check against RNAFold. If I get 90% or more there, I am satisfied and publish.
(2) Try to use a snappy name. My first names were nuclear tests (Starfish Prime), then story characters (Trillian, Ford Prefect, Frodo, Bombadil), then names related to animals (currently amphibians: Axolotl, Toad Hall, Kermit).
(3) Be successful in earlier rounds, so voters know who you are.
(4) Get in early in later rounds. This gives your design more time to accumulate votes. If your first design is good enough and early enough, your second and third designs in the same round, even if far "better", will never catch up in votes.
(5) Recycle your "better" designs from earlier rounds into the next round, with trivial changes, if necessary.
(6) Better still, modify someone else's successful design from the previous round. In later rounds, I modify my own design or someone else's, to try to strengthen the main stack, to clean up the dot plot, to diversify stacks, to increase the percentage on RNAFold. No matter whose design I use, I keep a short snappy name, with no "Modified" or "Version" in the name.
(7) Start your design comment with the RNAFold percentage.
(8) Vote for your own design, so that it starts with one vote.
(9) Vote for "dot-plot clean" designs with 2 or more votes than yours - your design won't catch up anyway, and it ensures that designs with less votes than yours won't get those slots.
Photo of Adrien Treuille

Adrien Treuille, Alum

  • 243 Posts
  • 33 Reply Likes
Fascinating, and totally not cheating. You're playing the game as it was designed, which is very interesting.
Photo of Joshua Weitzman

Joshua Weitzman

  • 93 Posts
  • 0 Reply Likes
You could do both. Study your lab contributers to make your algorithms and fine tune the rules of your game to increase your lab participation. Is your lab groups choosing the best possible design or are you make an algorithm that will choose the 2nd or 3rd best design. If I wanted to model a system I would have it at is peek efficiency have a group in one cycle design and select for high% synthesis. I think you would need a slightly larger group to do this constantly. The problem you have with publication is you need it to grow. But every time you publish you expose new people to your game which is good. The bad part is you have horribly retention rate. And to get people to come back to your game after they exposed to something they have rejected is hard. I don't think there its a large population of people that enjoy playing science based games. the more people you have signed up the more money you will have to put into data storage. you can reduce your overhead by increasing your retention rate. if you have more then one selection system I would roll them out in parallel and let them compete to give the optimal metric. The voting system could be as fun as the design system. If people feel like they are cheating then there is a flaw that is driving down your retention rates.
Photo of memory60

memory60

  • 1 Post
  • 0 Reply Likes
I have no lab RNA experience so I only put colors and patterns together which I like to do and like getting rewarded when getting the correct sequence.
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes
That's very interesting memory60 - do you have any specific patterns and colors you choose?
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
I have noticed that there seem to be a rule for the direction of the closing GC pairs in inner loops. In most of the winning designs, the GC pairs turn in the same direction. (EG: GCaaaaGCaaaa... and not GCaaaaCG...)

(RNA fold seem to have a preference for same direction GC pairs in loops as well. It is visible in the colors of positional entropy, if GC pairs in loops are not turning the same way)

Well no rule without exception, Donald managed to break it with his Dings's Imp'o'imp, in the star puzzle.

But in designs with multiple loops, as The branches, the picture might be a bit less clear. There still seems to be a preference for same directional GC pairs, especially in the middle loop, which holds tree big stems together.

In the two loops (In the 20-34-36 area and 59-72-85) I guess there could be a certain alowance for different turning GC pairs. Starryjess made Y oh Y, with a synthesis result of 84 %. But we only have 5 results with a synthesis % over 80, so I'm only guessing.

If my theory is correct, it may come down to that different turning GC pairs in this area, makes a bit of assymetry in the outer part of the structure, which again helps the stems avoid pairing up with each other.
Photo of xmbrst

xmbrst

  • 13 Posts
  • 3 Reply Likes
Count me into the hypothesis testing coalition. Just let me know what to vote for.
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
I would love to do this on the highest scoring design so far, Mat's Branches V1. But "unfortunately" he don't uses tetraloops, so there would be more unknown factors in play, than just the opposite turning GC-pairs and the neck. So I think I will wait till a later puzzle, with a more usual design candidate. But thanks for the support!
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Hi Adrien and Xmbrst!

Here is the answer to what happens, when turning GC-pairs in the opposite direction. Back when I started this post, we didn't had the energy view mode in the puzzles and therefore could not see what was going on. This is a demonstration of what goes on inside a multiloop on energy level.

Energy forces at work in multiloops
Photo of Adrien Treuille

Adrien Treuille, Alum

  • 243 Posts
  • 33 Reply Likes
Very interesting, Eli. Do you feel that the experimental data bears out this observation, or that further tests might confirm / falsify it?

I guess I'm asking because one of the main points of EteRNA is to glean insights into RNA nano-engineering which are not present in existing models, such as the model which EteRNA itself presets to users.
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Hi Adrien!

This is a mixture of things I been writing to Jee and Jerry lately and sort of an overview about what I found out about direction of GC-pairs in multiloops.

I have been very focused on if there were a pattern in where opposite GC-pairs in multiloops were allowed (neck is safest). Think it came down to that I discovered the pattern of direction of GC-pairs, before I knew about the energy in multiloops. Now I sort of think that a certain amount of opposite GC-pairs are allowed in multiloops, as long as the negative energy in the multiloop don't go below a critical limit. I still think that preferably all GC-pairs should turn in the right direction (red nucleotide to the right) as this gets the negative energy inside the multiloop up, and ensures it stays together. It may however be helpfull to lower energy around the neckarea, as to make eg. a low energy neck (neck with collective low energy) work together with the multilioop.

It is only in the designs with asymmetric multiloops - different numbers of nucleotides between arms, that I discovered a pattern in where these opposite GC-pairs in multiloops are most likely to be tolerated, other than in the neck. That is in the toparm, the one where there are least numers of nucleotides on both sides of this arm, compared to the others. I will do a post on asymmetric design later, when we have a bit more data. Tendency is that most of the rule for the symmetric designs hold in asymmetrics as well, just that they are a bit sloppier, it is allowed to stray a bit from some of the common rules.

Jerryfu is programming my strategies for direction of GC-pairs in multiloops. He asked a great question: (He needed to know this, to know how to score designs of this type, with my comming algorithm)

Do multiloops need at least 3 closing pairs (ie. 1 neck, 2 arms), or 2 (1 neck, 1 arm)?

And he have a point. Things do look different in designs like the finger, where multiloops only have two arms or one arm and one neck.

I haven't been keen of having the finger design in my algorithms for direction of GC-pairs in multiloops, but hadn't thougt up a way to exclude it. The tendency for direction of GC-pairs are not as clear in loops with just two arms, or one arm and a neck. Besides the data from the Finger lab is contaminated by all the colored nucleotides that are placed in the multiloop ring, so I can't see which is causing the mispairing of the shape data - the nucleotides in the loop ring or the wrong direction of the GC-pairs in the loop closing. I do suspect however, that direction of GC-pairs does not matter so much here, as in loops with more arms. Those (small) multiloops with few arms are not as energetic pressured as the bigger multiloops with more arms and thus negative energy inside the multiloop is not that important to keep the structure together. Time will tell with this one.

But for lab designs in general, at least 2/3 and rather 4/5 of all GC-pairs in multiloops should turn the right way. And opposite GC-pairs are best tolerated in the neck.

Yes, I do think experience and data confirms this theory about direction of GC-pairs in multiloops. It is not always a 100% rule, but the tendency is clear. If followed, it mostly pays of.

Hopes this answers your questions
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Hi Jee!

I'm not sure how the algorithms work, but you know how long the string of nucleotides in the puzzle is and the end structure you want it to fold into. So you know where along the string/ in which areas the GC pairs that closes the stem, are connected to the inner loop. (12-93, 90-54, 51-15) The branches puzzle

So if you make a computer pick these 3(6 nucleotides) points in the inner loop and run a check that those nucleotides turn in same direction.

I'm not sure if the algorithm se the puzzle as a string or a structure. But if it sees it as a string, you could tell your computer that if nucleotide 12 is red, then nucleotide 51 and 90 should be red as well. And if 12 is red, then 93 should be green, and when 93 is green, 54 and 15 should be so as well.

And the oposite senario if 12 is green then 51 and 90 should be green, then 93 is red and so should 54 and 15 as well.

You can then ad the same kind of check for the two other big loops with 3 GC pairs in them, if it turns out that there is a benefit from the GC same direction rule there as well.

You would have to reset this algorithm for each new puzzle, because the GC spots will change.

And you can do the excact same thing with the nucleotides in the big loop that does not close the stems, and check that all are A's, or that the color of the non-A nucleotides are a green or red and in a position that might be allowed. You know the position(number in the row) of these nucleotides in the end structure, therefore you know which nucleoutides the computer needs to colorcheck.

Now you asked. I was actually wondering if it would be possible to get the robots to vote on designs, from criteria like Ding's max 0,3 entropy, ensemble diversity under 0,5 and now my GC direction thing? Not that I want them to, I love voting. :)
Photo of blubblub

blubblub

  • 60 Posts
  • 1 Reply Like
First off -- amazing work, Eli! Second and this is directed to both @Eli and @Jee -- blubblub seconds any initiative to expand the current color template for the four nucleotides. Our ideas our already stated above and the images with captions can be found at: https://picasaweb.google.com/collinsm...#
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes
This is indeed amazing - kind of like a pseudocode.

EteRNA bots can only do designs - their algorithms were fundamentally created to do the design only, and nothing else. Our hope is to make a bot that can actually vote based on player strategies like yours : )
Photo of zilagorila

zilagorila

  • 1 Post
  • 0 Reply Likes
I'm not quite sure that my response is what you are asking for, but I thought I'd throw it out there anyway.

I have a strategy for solving the puzzles generally (at least to the point of being accepted by the software). I use the following steps:

1) Put a G-C at the opening of every loop
2) Use A-U pairs on the long stretches, switching between A-U and U-A because this is more stable than putting As or Us next to each other.
3) Put a G in every loop next to the C (or U) of the pair at the mouth of the loop. This is particularly important in small loops
4) Put Gs for single pair loops
5) When a single base needs to be unpaired, make it G, and surround with A-U pairs with the Us on the same side as the G
6) Do whatever is required to further stabilize the molecule. Sometimes this means adding more G-C pairs, sometimes switching the order of a pairing.
7) If the molecule has symmetrical components, make sure that the components each contain something unique to prevent multiple pairing possibilities.
8) Add in the required number of G-Us

I haven't become so enlightened as to optimize this strategy for getting good synthesis results, but it works for the majority of puzzles.

Like Eli, I have also noticed that it is often better to make closing G-C pairs (pairs at the mouths of the loops) in the same direction.

Happy solving!
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes
Zila Gorila - totally what we are looking for. Very interesting strategy!
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Saw a design, no name, no blame :), in the voting area today, so here is what I think.

Example from this puzzle:

UUUU
GAGA

(loop to the left before nucleotides and inner loop to the right after)

For memorization: Lady Gaga should not sing UUUU in your puzzle :)

No lines with 4 U's in line, no matter what their combinations of G's and A's are.

This apply to nucleotides, no matter what color. In general there often is a penalty for having more than two repetitative neucleotides in a row.

That applies both if the basepair only share the same colored nucelotide in the one side of the string or if the whole basepair is repeated.

Two repeated (sameturning) basepairs like 2 AU's, sometimes even 2 GC's too, can make trouble. 2 GU's beside each other are a problem no matter how they turn.

The puzzle already by default rules out using four green and reds in a row.

So the general rule about repetitive nucleotides is:

Twist 'em baby!
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
I wholeheartedly agree here. Even 3 Us in a row is (in my opinion) asking for trouble. Maybe because of the tails which we have no control over being mostly As.

Really, the only place I want to see three of any nucleotide in a row is if it's As in loops.
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
Yes. 3 U's in a row or G's, C's and A's are a bad idea. 4 is a really bad idea.

Funny and true end comment.
Photo of wisdave

wisdave

  • 27 Posts
  • 1 Reply Like
Hmmmm.... That wasn't my design, but I do have 3 G's in a row in one of the branches. When I flip the GU pair, the stack drops from -1.5 to -2.5, and the total drops from -59.1 to -59.4. The dot plot is minimally improved. I'll remember this in future puzzles. Thanks for the insight.
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
I noticed a thing more. As the neck area seems to be a constant battleground for us, I'm beginning to see a reason why it is so hard for us players to keep this part of the puzzle together.

My theory is that another rule we seem to work by, interfere with our understanding of how to make the neckarea stick together.

Interfering rule:
If a string hold more than two GC'pairs, sometimes there is a penalty for having them all turn in the same direction, even if they are not right besides each other. Sometimes it is even necessary to twist one of two, in each closing end of the string, GC's basepair, as it improves the stability of the structure and make the entropy look better.

(I think this rule have to do with breaking symmetry and thereby preventing the structure from folding in the wrong places.)

The neck area is tricky and usually a lot of GC pairs are needed. (one robot made a functioning neck with only two GC pairs!, Nupack bot design 1, The branches) I think we unconsiously do as we use to do in other strings. That means that we often make the closing GC pairs connected to the inner loop turn in a different direction that the closing GC pairs at the end of the neck. Those who use CGCG (11-12 green/93-94 red) in the neck near the loop and the different turning GCGC (6-7 red/98-99 green) in the neck closing seem to be in trouble.

Actually we might learn from the robots. They seem to be onto something, when it comes to the neck. Maybe it is because they start from the neckarea working their way in and we maybe have a tendency start from the middle and forget that nucleotide 1 and 2 are red and will pair up with whatever pair of green nucleotides they can find their way to.
Photo of Eli Fisker

Eli Fisker

  • 2223 Posts
  • 485 Reply Likes
I was stating that there sometimes was a penalty for having all GC-pairs in a string, turn in the same direction, even if they are not right besides each other. And that twisting one or two ( if more than 2 GC pairs in the string) is helpful. (Should often be the GC-pair closing the tetraloop, as not to interfere with the sameturning rule of GC-pairs in the multiloop.

I saw a picture of a miRNA helix on the book Genetics: A conceptual approach and had the thought: Maybe the reason we sometimes need to twist one or two GC-pairs in a string, have to do with the spirally structure of the doublestranded part of RNA - or in other words, about helping the RNA start spiralling and be more stable.

What is the cause of the spirally structure in RNA and DNA anyway?

Let me hear what the rest of you think about this.
Photo of xmbrst

xmbrst

  • 13 Posts
  • 3 Reply Likes
Voting strategy:

1) Look at designs with GC content at the lower end of the range for this shape, and pick a maximum GC content threshold such that there are some designs with GC less than this threshold that also have clean pairwise probability plots.

2) Look at the designs in descending order of current number of votes, and vote for any that have GC content less than threshold and clean plots.

Meta-comment: Some people have described their self-interested voting and designing strategies with a note of cynicism. I think that the cynicism is misplaced: the game is essentially an ensemble algorithm: we have many agents with their own algorithms, and the voting game is how the strategies get merged. Ensemble algorithms usually rule in natural language processing, and they seem to rule here to. You could probably write a computer program that captured this.
Photo of blubblub

blubblub

  • 60 Posts
  • 1 Reply Like
We saw the exchange between Matt747 and Jeehjung tonight in the eteRNA community forum chat area concerning color and coordinates for eteRNA designs. Both ideas complement each other and would be excellent tools. If color blind mode means the ability for players to assign their own colors to current data sets that is fantastic. If so, team blubblub would like to make five requests based on our experience: at a minimum, could the color black be added to the current color template?; could we be given the ability to assign any color to any nucleotide (Example: guanine could be red but it could also be black, yellow or blue); could the color template be made available for both the synthesized and unsynthesized data sets?; could the expanded color template be made available for player created puzzles?; could at least one nucleotide ball template be expanded to 1024 balls?

If future developments provide these additional layers of support, we will happily post and share our results with the eteRNA community.
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes
Xmbrst thanks for sharing your strategy! GC ratio really seems to play a major role in picking good designs among player
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
Ok well I finally managed to get some time to finish this off.
This is how I go and solve a puzzle, and for the 98% of them out there, it’s all you’ll ever need.
Although some of the more enterprising player puzzles utilise specific combinations, that you’ll discover if you have the time and patience for it.
Disclaimer: I’m not telling anyone how to play, but this is just how I go about it.
Two main approaches
A) From scratch using standard heuristics
B) Modifying best lab score, using experience gained, stabilising tetraloops and minimising high energy differential neighbours.
Ok, lets start.
1. Use Q to highlight the whole puzzle in AU pairs.
Now its fine if you have all A’s on one side and U’s on the other. But every now and then you’ll need a pair to alternate to stabilise that stem. For most stem lengths, 2 alternates should be enough.
2. You locate and identify all the loops.
You should know your 1-1’s, 2-2’s by now. If not, please refer to the loop guide.
Tetraloops, G on the first nucleotide (nt) on the right.
If you so choose, you can leave your tetra’s empty with A’s. The reason for this is then your puzzle will have less possibilities to bind to the other nt’s. By modifying your tetra’s, it can complicate the folding process.
Loops with two stems, G on the same side of each connector, usually on the side with less nt’s.
Internal Bulges are annoying, always wrap them on GC’s on either side.
1-3’s G on bottom and top right nt.
Multiloops can be tricky. No general rule to them but if you are just trying to lower their energy, try a G around the connections. At least one of them will work. And if you are lucky, sometimes the opposite nt will also work, lowering it further.

3. Play it by ear at this point. Look at your minimap, look at what areas are still red. I work on the strategy of using GC’s to stabilise, then minimise GC’s later. If the puzzle is especially loop heavy, watch your energies and neighbours. Because with loop heavy designs, they are more prone to domino effects.

4. By the time you’ve reach 50k points, you should have reached an intuition already. Trust it.
Now just do three things. You mouse-over each quad, checking the energies. Generally if any quad is higher than 3 difference, you lessen the differential. Usually found when a GC is next to an AU.
Second, I go to RNAFold, to measure entropies. Usually I don’t bother with the head or tail, cos they are hard-coded, mostly out of your control. But generally I try to keep the rest of my designs under 0.05.
Third, read your design from the end to the front. aka 3’ to 5’, where 5’ is the beginning of your sequence on nt 1. I do this because that is how the RNA ends up folding, in reverse. And you try to minimise the chances of it mis-folding.

Please note: In nature it folds from the front to the end.

If you are designing a lab that is past round 1, should always base your design on the top winner of the previous round/s. I know some people dislike the refining strategy. But then you can start off with a control and measure what works or doesn’t work, in future rounds. E.g Branches. Look at the difference between mat’s V1 and Berex 3-1.

Hope this helps. Enjoy! :)
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes
This is incredible Berex,

In fact, Eli and mat747 have been talking about your energy differential points in 4 (It was also discussed by AnticNoise in one of old GetSat post http://getsatisfaction.com/eternagame...
Photo of cdmonson

cdmonson

  • 1 Post
  • 0 Reply Likes
Generally, I start by copying an early design that has a decent amount of votes, thus saving time and improving to odds of gaining some points if I vote on both theirs and mine. I also tend towards submissions with stats (bond counts, MP, etc.) that are solidly mid-range.

I then go through pair by pair, strengthening any weak points and then weakening some of the overly-strong sections. I try to strike a balance between stability and ease of bond formation (because that's what life would do). Not too many GUs or CGs, but not too few either.

I'll submit that and then make a few minor alterations and submit those as well.
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes
This is amazing - thanks cdmonson!

I'm finding more and more people using the energy balance approach. May I ask how you search overly strong or weak points?