Switch Scores for EteRNA Switch Puzzles

  • 11
  • Article
  • Updated 5 months ago
An exciting direction in EteRNA is the study of riboswitches!

We have recently finished our pilot experiments with great initial success. Using a new technique that measures switching directly on a sequencing chip we directly observe the switching for thousands of designs at once. The signal is generated by a fluorescent RNA binding protein, MS2, and instead of the standard EteRNA score, which is based on the correct folding of each base, we have introduced a new Switch Score.

The Switch Score (0 - 100) has three components:
1) The Switch Subscore (0 - 40)
2) The Baseline Subscore (0 - 30)
3) The Folding Subscore (0 - 30)

The scoring scheme is summarized below. A more detailed description is given in this PDF:
https://drive.google.com/open?id=0B_N0OA9NROPGel80SG5LM0wtZms&authuser=0

A typical example of a switch puzzle is shown below:


The player designs the structures in [1*] and [2]. To observe the switching we then measure the fluorescent signal of MS2, which binds specifically to the MS2 hairpin seen in [2]. In the absence of FMN, the MS2 should bind and the switch is ON. On the other hand, if we introduce FMN, the ligand in [1*], the switch should be OFF and not exhibit fluorescence.

No switch is 100% ON or OFF in the absence or presence of ligand, but a good switch can come very close (and get a perfect EteRNA Switch Score!). A some MS2 concentration, the difference should be large (e.g., at ~100 nM MS2 in figure below). In practice, we don't know this concentration beforehand so instead we perform measurements at many concentrations to obtain binding curves. When the switch turns OFF (red curve), the effective dissociation constant increases. The dissociation constant, Kd, is the concentration where half of the RNA binds MS2.


The Switch Subscore quantifies how far apart the Kd's are in the absence and presence of FMN (horizontal distance between the red and blue curves).

The Baseline Subscore is a measure of how close the ON-state is to the the original MS2 hairpin (lower Kd is better, i.e., blue curve should be far to the left).

The Folding Subscore is high if MS2 bind properly in the ON-state at any concentration (the score should be high for the blue curve at high concentrations of MS2, i.e., high values to the right)

In our first experiments, we found that the easiest score to maximize is the Folding Subscore, followed by the Baseline Subscore. These two ensure that the MS2 hairpin is properly formed in the ON-state. The hard one is the Switch Subscore, which is the highest when the energy difference between the states is finely-tuned to the energy conferred by binding to FMN (or other future ligands).
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes

Posted 4 years ago

  • 11
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Here is a plot of RMSD in Eterna Score for R95 vs R96.
There are some fluctuations depending on which designs you include in the analysis but it is consistently around 6 or 7 for designs with more than a handful of clusters.


(Edited)
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 122 Reply Likes
For comparison, can you plot intra-run RMSDs (e.g. if you calculate scores across all the clusters for a single sequence, what is the expected standard error on the median for eterna score)?
Photo of nando

nando, Player Developer

  • 388 Posts
  • 71 Reply Likes
regarding the second plot, do we have an explanation for the large yield differences? at what stage does it happen? amplification(s), addition of the flanking sequences, implantation on the chip, other?
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
@rhiju: Yes. The standard dev. and s.e.m. for the single cluster values are already in the data but doing the error on the median is a good idea.

@nando: We don't know for sure where the yield differences arise. My hunch is that it is during synthesis and amplification rather than during the hybridization to the chip. We are already doing emulsion PCR to minimize sequence bias during amplification/addition of flanking sequences, but there is not too much we can do about the synthesis. The number of clusters was strongly dependent on the A content and we only select clusters with perfect sequencing data, so one hypothesis is that the sequences with many As yield fewer usable clusters.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 966 Posts
  • 304 Reply Likes
@johana, what percentage of clusters get rejected simply because of an imperfect sequence?  Are there often enough clusters for an "imperfect" sequence to report the results for the sequence even though it wasn't purposely submitted?  If so, it would be interesting to see if these mutated sequences tend to do better or worse than their designed counterparts.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 966 Posts
  • 304 Reply Likes
Re "... one hypothesis is that the sequences with many As yield fewer usable clusters." -- I think the word has gotten out about this, and it is no longer a very big source in variability between cluster size.

On the other hand, if we look only at the lower end of cluster counts, we can still see the negative impact.

So we still have evidence that a large percentage of A's is not good.
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Looking at all the sequences that don't perfectly match the design should be very interesting and it is something we have started to look into. More of the time there are very few clusters with identical mutations but some should give good data. Vineet has analyzed this and I will let him speak to the details. Perhaps we could make his dataset public.
Photo of vineetkosaraju

vineetkosaraju, Alum

  • 100 Posts
  • 10 Reply Likes
Hi Omei,

Currently we only use sequences that exactly match the design. We have everything else stored, but we are not analyzing it. These sequences are only around 10% of the actual synthesis data, so we are currently underutilizing our resources.

I've been working on analyzing R95 mutants so that we can use more of the synthesis results. Most of the time these mutants have a low amount of clusters, just because there are so many possible mutations that can occur - so we will never get to using 100% of the actual synthesis data, but there is still large improvement if we can get to even ~20-30%. 

I've shifted my focus mostly from analyzing these R95 mutants to developing the switch strategies version of Eternabot, so I don't remember the exact numbers.

I think that originally there were around 10 million sequences, and only 1 million matched exactly. From R95, I was originally able to recover around 4 million additional sequences, but there is still far more analysis to be done so that we continue filtering out bad fits, so that number can (and probably will) drop significantly.

Once I finish implementing the switch strategies version of Eternabot I will go back to analyzing R95 data, filtering out bad fits, to see how many sequences there are remaining.

Afterwards I think it is a good idea to make the dataset public, and it would be interesting to see if any mutants are better than their originals at switching.

Keep in mind that this process of matching mutations to their original sequences will sometimes be inaccurate (the process is far from perfect, I would call it a greedy algorithm) - but we are hoping that the sheer mass of data will allow for us to form general predictions/trends and overcome those inaccuracies.

Thanks,
Vineet
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 966 Posts
  • 304 Reply Likes
Thanks for the response, Vineet!

A while back, I was examining a set of data files from the Illumina sequencer that had not only its best estimate of what sequence it had read, but a confidence level for each base.  I had the impression at that time that the data processing for the game just took the best guess and rejected the sequence if it didn't match one of of those submitted.  If that is indeed what was happening, it would throw out both true mutations and mis-reads.

When you say "matching mutations to their original sequences", are you referring to using this confidence data to try to distinguish true mutations from scanner mis-reads?  Or something else?   On the surface, it would seem that if there were a sufficient number of clusters with the same "best estimate" sequence, it could be taken at face value that it was a real mutation (perhaps a multiple mutation) and worthy of reporting.

But then, problems always seem simpler when I'm not the one being asked to solve them. :-) 
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Hi Omei,

We do indeed get the Illumina quality score for each base. For the regular analysis we throw out all mismatches, irrespective of the quality score. For the expanded analysis Vineet was describing, looking at the quality score is a very good idea.

I believe that the "matching" refers to mapping the mutated sequence to the original design. For example, players sometimes do mods of a previous design and if a sequence has one or more mutations it can be hard to distinguish which of several mods it belongs to.
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
A query in regards to the adenine percentage and cluster yield relationship. Is it possible that something like pyrimidine dimerization is occurring in the DNA templates? The presence of thymine dimers could possibly explain why there is such a low cluster yield for these sequences, but it would require the DNA to be exposed to UV radiation somewhere along the synthesis pipeline.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 966 Posts
  • 304 Reply Likes
@johana:  Why is it important to map the mutated sequence to the one that it was mutated from?  On the surface, it seems like it should be treated the same as any other mod (but with the name and ID assignedex post facto.)  We could even assign it to a specific player, named realworldscience.
Photo of vineetkosaraju

vineetkosaraju, Alum

  • 100 Posts
  • 10 Reply Likes
Hi Omei,

I didn't know that the Illumina sequencer has confidence levels, but it sounds like a good idea - looks like Johan has been holding out on me :P

Currently the way that mutations are matched to their original sequences is that a distance is calculated between each sequence that is being synthesized and the original sequences. This distance is the levenshtein distance, which increases the distance based on single-point mutations (insertions, deletions, single point changes). The match is found by simply selecting the original sequence that has the smallest distance. My explanation is pretty bad, so here is a diagram that (hopefully) helps clarify:



@Brourd, I'm not sure if Johan exposed the DNA to UV radiation, but I have a friend also at Stanford who accidentally left the radiation on over the weekend and caught it on Monday - he's in a different building thankfully :P 

Thanks,
Vineet
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 966 Posts
  • 304 Reply Likes
Thank you Vineet.  And your explanation was perfectly clear, no need for apology.

I can see why figuring out the "evolutionary history" of mutations would be an interesting study in its own right.  Especially if it led to advances in controlling mutations.  I'll be interested in seeing what you find.

Still, I wonder why that is relevant to reporting the switching results.
Photo of vineetkosaraju

vineetkosaraju, Alum

  • 100 Posts
  • 10 Reply Likes
Hi Omei,

I don't remember exactly why matches were needed for reporting the switch results, but I think it had something to do with fitting. As much as I like to pretend, I don't know much about the binding curve fitting that Johan is performing - I'll wait for him to respond with a more accurate/eloquent answer :)

Thanks,
Vineet
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 122 Reply Likes
@brourd, there's no UV treatment in the synthesis or purification protocol at least on our end. the explanation of the problem with high A's is veyr likely  depurination side reaction that occurs during synthesis and chemically destroys As. Known problem. We are actually now talking to other companies who have resolved the issue (but are not used to delivering the kinds of libraries that eterna needs). 
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
@omei: You're absolutely correct in that we don't need to map the mutated sequences to the original design in order to report them. The results and the fits are valid regardless. Knowing the original design would just be a tool for putting the mutated switches in the right context. In other words, grouping all the derivatives of original designs may help identifying what makes it better or worse.

In terms of reporting the results, the best is probably to hand out all the data and include fields that indicated the closest match and the mutations.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
is even ambient lighting from, say LEDs, neon tubes, or, gasp, apple Iphone screens already a problem?
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
The data are collected in the dark, so I don't think that ambient light is a problem.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
good practice
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Round MS2 NG sum up

Distance of MS2 and FMN sequences in exclusion and same state labs

As suspected, placing MS2 earlier and closer to the FMN sequence in a Same State lab didn’t exactly raise overall score.


“Exclusion labs seems to want to have the MS2 real close to one of the aptamer sequences. As the labs where more distance were forced in Exclusion 5 and 6, didn’t score too well and the majority of the high scorers became full moving switches.” (Footnote 1)


Also placing the MS2 sequence further away from either of the FMN sequences in the exclusion labs, didn’t raise score, rather the contrary.


“Where Same State lab types actually prefers some distance between MS2 and FMN.” (Footnote 1)


This generally held true for the high scorers I have been watching. Despite we this time had the capability of moving around how close MS2 were to the aptamer sequence.


Some labs that stuck out


I have a note back for the labs that have MS2 in between the FMN sequence, be they exclusion or same state. (Same state 2, Same state NG 2 and Exclusion NG 2)


For the exclusion labs, the lab where the MS2 is in between the FMN sequences, there is a slightly more willingness to keep a bit of distance MS2 and FMN, compared to the labs where the MS2 is on either side of the FMN sequences.

Similar for the Same state labs, where the MS2 is in between the FMN, there is a slight more willingness to have the MS2 closer to the aptamer sequence than the labs where the MS2 is on either side of the FMN sequences.


Same state labs


Normal distance range between MS2 and aptamer -  no matter side for same state labs is 6-11 bases. Above 11 bases and the design starts benefitting from having a static stem knotted of some of the unpaired bases.


Exclusion labs

Normal distance for Exclusion labs are 0 bases between MS2 and aptamer. A distance of up to 3 is tolerated, but distance is generally less beneficial, the bigger the gap gets.



Exclusion 5 and 6


Plus a small look back on the part 2 of Round 3


I have been saying that it was as if Exclusion lab 5 and 6 were almost insisted on being full moving switches or close to being full moving.  


Funny enough, me insisting on in some cases treating those two labs as a model switch lab, by putting in multiloops in both states and making the switching area small, resulted in a stack of designs with the fine score of 0. Couldn’t help but laugh about that. :)



I also got a stack of clean 30 scores by doing exactly the same. :) Although to be fair I did manage a 74% score out of max 80% till now, with a partial moving switch. But all the higher scoring ones really were full moving switches. :)



This lab definitely want to switch big time. :) But I still don’t like full moving switches. :)


I count full moving switches less relevant (for now), since why go in circles to get somewhere, when one can take the direct route, by putting the switching elements close together to make certain that things happen.


Background articles and footnotes


Footnote 1

MS2 NG Data + drawings
(Edited)
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
congrats on the straight string of zeroes. Always feels good to have those in the bag, I know the feeling well.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Long and multiple repeat U’s being problematic


While I have been a strong advocate for repeat bases in riboswitches, I am seeing something, which are making me change my mind a bit - at least for now. And in particular when it comes to U repeats. In the first round I recall there being many repeat U’s - also particularly in the designs that had high cluster counts. That still is the case.  


However the repeat U’s disproportionately show up in the lower scoring designs and far less so in the high scoring ones.


What might be going on?


I can’t say if this intolerance of repeat U’s has something to do with the fairly short length of our sequences - short sequences naturally can tolerate fewer and shorter repeat U’s than a big several hundred bases long sequence. Also I can’t say if it has anything to do with limitations in the lab method, that is making many repeat U’s problematic. So with this hint of warning - careful with the UUUUUU’s.


Should I guess on what these long lines of U’s do, they seem to be hurting the folding change and baseline subscore.


Another reason why the many repeat U’s seem a problem are that they are often synonym with full moving switch. Lots of repeat U’s are simply more likely to result in the full switch moving. And as such I count it bad. So bringing down repeat U contents, should also bring down the amount of full moving switches.



Advice


Advice on what I see as problematic for now:


  • Designs with 2 UU’s - no trouble, unless there are more and they are like 1 or 2 bases apart. (Dangling tails with C’s and U’s are a bit more tolerant to repeat U’s and similar is the MS2 turnoff sequence in exclusion labs)

  • Designs with 3 UUU’s may get in trouble, while mostly not. More if there are more of them.

  • Designs with 4 and more U’s are more frequently among the lower scoring designs.

  • The more and longer U repeats, the worse.

  • Repeat U’s in loops seems to bother less.


I will be thankful for input on this. Please bring forth your graphs and statistics and tell me - do we need an U-meter too?


Background article


Repeat A’s and U’s and cluster count

Photo of Hyphema

Hyphema

  • 91 Posts
  • 25 Reply Likes
Interesting Eli, I must admit I did keep in mind the earlier thoughts of long stretches of U's. Perhaps being helpful. This may have been influenced by the miRNA lab but nevertheless I did just submit a few designs with stretches of U's. I think many had them in loops which I would agree with your latest thought that it may not be as detrimental. Most are mods so there are many more designs to be seen with poly U's.
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
The possibility exists that the uracil residues within loop sequences are participating in alternate folds within the ensemble. It may be a good idea to add in a consecutive uracil requirement just for the sake of preventing sequences that may have 7+ consecutive uracil nucleotides. (7 being a randomly chosen number).
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
I got inspired by the story Macclark shared on the Poly(A), that make the transcript halt. And started to wonder if poly(U) can cause similar problems. Since I already think repeat U's are affecting our lab results.

Long and multiple repeat U’s being problematic

I found a small note in a book:


(Recoding: Expansion of Decoding Rules Enriches Gene Expression: Expansion of ..) page 417.

https://books.google.co.uk/books?id=8cSZpPWXoqIC&pg=PA417&lpg=PA417&dq=poly%28U%29+slipp...

Macclark sharing the article:

https://getsatisfaction.com/eternagame/topics/fun_rna_and_dna_science?topic-reply-list[settings][fil...

Plus the T7 polymerase should be the one that are in use in lab, at least it has been in the past. Nando wrote an awesome blog post about it some time back. About some of its strange quirks. Its quite entertaining. Hereby recommended!

What if somebody already knows?
(Edited)
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
I will have to find my extremely high scorer that was packed wit U's again - was it round three in the standard MS2, not the relocatable ones, hm, can't remember now... not sure if this universally applies
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Oh, and please define, in mol/l, what a moderate concentration of UTP would be considered to be, on average.
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
We use 1 mM UTP and the concentrations are the same for the other nucleotides. T7 polymerase was used for the chemical mapping experiments but for the array we actually use E. coli RNAP instead, since it allows us to stall at the end of the DNA. I don't know how many of the quirks that carry over and how many new quirks there are that are unique to E. coli RNAP. Poly-U and secondary structure formation is involved in termination (https://en.wikipedia.org/wiki/Intrinsic_termination).
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Knotting asymmetric tail ends

I have long been saying that it is generally a good idea to knot the tail ends of the RNA. Here is one more addition to my recommendation. What lengths the tails are, matters to how it is optimal to tie them together.


What I see is a starting tendency towards is throwing out a tail of bases to get the extra bases away from the multiloop or stem in the switching area, instead of just making the multiloop bigger or increase the amount of base pairs in the switching stem.



Riboswitch on a switch versus NG labs

In our early Exclusion and Same state labs, the tail ends were often too short to knot them together - eg in the Exclusion 1 and Exclusion 4 labs and Same State 1.

This is no longer the case in the NG labs. However there have come an asymmetric component to the tail lengths, as the one tail end is much longer than the other - even after adding in the MS2. (Exclusion NG 1, Exclusion NG 3, Same State NG 1 and Same State NG 3, where the MS2 does not get placed in between sequences of the aptamer.) So just knotting up the tail ends so they match from start to end, I think is no longer going to cut it, if we are to make tons of winners.


So rather the result of tying up the unequal lengths tails, as they were even and not affecting the rest of the design at all, in practice has the effect is that either the multiloop in the switching area is getting bigger or the switching stems in the switching area are getting longer.


I see more of the lower scoring designs to have far bigger multiloops compared to the main part of the high scoring designs. Across all types of MS2/FMN designs.


Also as as noticed by players before, designs that have many unpaired bases to paired base ratio, tend to do bad.


The thought started lingering while I was designing for tje exclusion NG 1 and 3 lab, round 2.I started kicking some of the single bases out as tail bases in both states. I have earlier been kicking them out as hairpins also. Later I was doodling RNA. Here is what I drew based on what I thought each lab solve ought to look like in the labs where the MS2 is not between the FMN sequences.


Cut and glue version


Cut and glue doodles2jpg


Color code:

Blue = static stem

Orange = FMN

Yellow = Salish hinge

Pink = MS2 or parts of MS2

Grey = Aptamer gate stretch that turns up between FMN and a static stem


The main thing to notice here is the dangling end. Normally dangling ends with unpaired bases are not very productive for designs. They tend to get in the way and want to pair up with anything in close vicinity either in sequence or space, if given the slightest chance and them having too high a frequency of non A bases - like my single stranded barcode experiment. Only in our first round microRNA labs Jandersonlee showed these dangling ends with unpaired bases to be productive, if they were complementary to and targeting a region in the microRNA.


However here, I think for the NG labs, the tail will be beneficial not for its value added to the design, but rather from making up for what seems to be a problem. It get what seems to be excess bases taken away from the multiloop that is either holding or part made up of the MS2 sequence.


Exclusion NG Round 1 Results

A few of the lab high scorers from last round do hint of the benefits of throwing a tail (NG 1 and NG3 labs) or hiding the excess bases in the static stem (NG 2 labs)


Here is one by Brourd, 82%, 13 clusters

http://www.eternagame.org/game/browse/5851784/?filter1_arg2=5943980&filter1=Id&filter1_arg1=5943980


JMF mod by Hyphema, 90% cluster count 22

http://www.eternagame.org/game/browse/5851785/?filter1_arg2=5951805&filter1=Id&filter1_arg1=5951805


These two designs were amongst the highscorers in their respective labs.


Dangling end or hairpin?

Also jandersonlee did a design where he tied up the tail bases in a hairpin. I think as soon as the tail at a length where it can make a hairpin like, 8-10 bases and more, then it will start be beneficial to make it a hairpin instead of an end dangle.


So basically I think both an end dangle or a hairpin stem as means to get excess bases away, are a viable road to take.


Why is there a static stem in the switching area?

I’m starting to wonder how much the static stem is needed. I still do think it play a role, but I think perhaps the size of it is more determined by the bases that needs to get tucked away compared to what it really needs.

Getting rid of extra bases by prolonging the static stem in the multiloop in the switching area.

NG2 Estimate - State 2jpg


I still think the static stem - which result in a multiloop happening in the switching area - adds something to the switchability, that an internal loop can’t. I think it makes the design go more unstable - which is an advantage when one wants a switch to move.


I wonder what is the kcal and entropy of a multiloop compared to an internal loop, provided that both had the same amount of ring bases?


Here is the state 1 of the lab drawings above

Guestimate - State 1jpg


The NG 2 lab drawings made me think that I don’t think that each mirror type of solve, is equally good. I think the static stem has a preferred side.


Idea for future NG 2 labs

I think for us to make the most optimal version of an RNA switch in the NG labs, we need to also be able to determine the position of at least one of the FMN sequences too and not just the position of the MS2 sequence. At least for the Exclusion and Same State NG 2 labs, as these labs where the MS2 is between the FMN sequences, we don’t have the luxury of kicking the unwanted bases out of the design in a dangling tail. In the NG 2 labs I see a tendency towards prolonging of the static stem - which is another way of getting rid of the bases.

Photo of Astromon

Astromon

  • 182 Posts
  • 23 Reply Likes
Thanks great info there. I just want to say the new labs have four locked green bases and need a dangling end to keep the constraint only three in a row G's and C's cleared. maybe there is another way that I'm unaware of, (and prob so) I just wanted to say what I observed. Thanks!
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Hi Astromon!

Nice noticed. It is possible to find more than 3 C's, G's in line in nature. So microRNA's can have them and riboswitches and other kind of RNA's. It is just that our lab just can't handle too many of these kinds of repeats. Nature don't like too many of them either, but is a bit more tolerant.

If you check out the sequences of natural riboswitches, you can spot them yourself. Here is some images with sequences from natural occouring riboswitches:

https://getsatisfaction.com/eternagame/topics/switch-scores-for-eterna-switch-puzzles?topic-reply-li...
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Omei had a comment to this section in the post above:

Also jandersonlee did a
design where he tied up the tail bases in a hairpin. I think as soon as the tail at a length where it can make a hairpin like, 8-10 bases and more, then it will start be beneficial to make it a hairpin instead of an end dangle.

So basically I think both an end dangle or a hairpin stem as means to get excess bases away, are a viable road to take.

I like how he explains it:

My own thoughts at this point is that if a loose end isn't serving a specific purpose (like acting as a landing spot for a miRNA), and is long enough to bind into a hairpin, it should be.

But are there examples in the data where this is the only difference between two designs?

So do one of you know of examples? Feel free to add them here.
(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Overall tendencies for MS2 Riboswitches On Chip - Round 3


We just got new data for round 4. :)


But while I’m waiting for it to upload in the game, I will post what I have been working on in relation to round 3. It is probably not going to get more finished. So here it comes.



I decided to draw up some of the main tendencies from the Riboswitch on a chip labs for comparison with the NG labs.


I have developed a new drawing style. Instead of showing the two states as two rectangles apart, I have fused them to 1. Instead I indicate State 1 with arrows above and State 2 with arrows below.



Exclusion 2

Interestingly enough the high scorer based on one of Mat’s designs, in Exclusion 2, seem to have its MS2 C’s flashed. But they are in an internal loop and not a multiloop. Plus this design looks to have 13 base pairs switching - which is far more than in the main part of the winners in the other MS2 labs.


Second highest scoring design from round 3 (89%) (I discriminated against the almost full moving high scorer. :) )

http://www.eternagame.org/game/browse/5736148/?filter1=Id&filter1_arg1=5815531&filter1_arg2=5815531



Exclusion 2 versus Exclusion 3
 

Another thing I found really interesting was that the table totally turned on the Exclusion 2 and Exclusion 3 lab, when they got run as the bigger Exclusion NG 2 lab. In the early MS2 labs, Exclusion 3 totally outscored Exclusion 2. However in the NG2 lab, it is the exact opposite case.


There were overall agreement for high scorers in Exclusion 3.


Exclusion 2 and 3 - draftjpg


I believe that Exclusion NG2 carries the true pattern for what will work, over both Exclusion 2 and 3. I think MS2 prefers to be to the one side over the other.


I have highlighted the CACCC bit of the MS2. Notice how all the highest scoring designs have the MS2 placed to the right. This placement is equivalent to the MS2 placement in Exclusion 2. And Exclusion 2 scored horrible compared to the Exclusion 3.  



I think I have an explanation on why. I have drawn an estimate on what I think optimal Exclusion 2 & 3/Exclusion NG2 should look like below.



The MS2 position prefers being fixed next or very close to one of FMN sequence in the Exclusion 2 and 3 lab. In Exclusion NG2 we have free choice of side.


The aptamer gate will have to pair with the part of the MS2 that is next to the aptamer. Which leaves the base sequences there rather locked, no matter which side the MS2 is on in relation to the aptamer.


State 1 on top and State 2 at bottom.

NG2 Estimatejpg

Notice that the left bottom image has the MS2 C’s available in the multiloop in state 2. (Green dots). Also notice that the Aptamer twin G’s in FMN1 are unpaired and available. (Red dots)


Similar for state 1 (left top image), the MS2 turnoff and aptamer turn on has some pyrimidines unpaired and available for a shift.


I think that pyrimidines are easier to get moving than purines, while purins and G’s especially are also involved in switches - just to mention both the sequences of the FMN aptamer.


But the available C’s, are why I think the left bottom option (equivalent to Exclusion 2) with MS2 to the is the dominant in the NG 2 lab, is because it has easier access to let its C’s get involved in a switch by having some of them available in a multiloop.


The C’s in the MS2 is spaced just close enough and not too far away for them to get in use. Whereas when the MS2 is placed right after FMN (right bottom picture)


Also regularly one of these or both of the MS2 C’s are involved in a direct pair with one of the FMN G’s.


Why does it matter with flashing some C’s for switching, in state 2, when the design is already switched on?


I read somewhere that enzymes were more effective if they could get both quickly on and do their action, but also easily let go. I think the dangle sequences are helping overcome an energy barrier for switching - and that this can happen both ways. Eg. it doesn’t matter if in a lab, gets easily turned on in state 2, but can’t go back to state 1, when the molecule to make it shift back, is present.


In other words, I think the sequence overhangs in either multiloop and in FMN loop, can help aid get a switch moving. And as such I think we can use them consciously to help get things going.



Same state 2

I have been drawing of some of the main solve types of high scoring solves that also have decent cluster counts. (20+) I have ignored all kinds of fuller moving switches.


These design types have a number of representatives, that are very much alike. Also notice how the static stem turns up at the same side of the MS2 for all types. Watching them as linear, they look very similar except for minor differences, especially around the aptamer.


Same state 2 - finished draftjpg


Here is an attempted average for the Same State 2:



Particular note that the MS2 has both its ends binding up in state 1. This is what I see in both Same State and Exclusion high scorers now. Only Exclusion labs does it in state 2 instead of state 1. Since they are opposite labs. One should have its MS2 turned on (Same State), the other have the MS2 turned off (Exclusion).  


I believe this with strapping up MS2 from both sides to be a useful characteristics. It is possible solving with just one stretch of the MS2 moving, but getting both ends of the MS2 sequence, seems to be getting more and more frequent. The MS2 is more than happy enough to pair with itself, so it doesn’t need much coercion, hence the need to hold it well in both ends.


Example of MS2 being bound up in both ends, when it is not on.


Salish99-ss2-r3-066 (99%)

Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
The curious case of the missing MS2 gate


I couldn’t help look at some of the new results from round 4 and thus I am now starting blend in some round 4 analysis along with my late round 3 thoughts.



MS2 Gates - where and when


Example of MS2 gate, Taken from Salish’s winning mod of a Brourd design in Exclusion 3, round 3.


http://www.eternagame.org/game/browse/5736149/?filter1_arg1=5744114&filter1_arg2=5744114&filter1=Id


MS2 gate explanation: Exclusion labs favor having their MS2 sequence next to one of the FMN sequence. The MS2 likes to have a turnoff sequence next to itself also on the other side. The MS2 gate happens when the FMN on one side of the MS2, pairs with the MS2/aptamer turnoff sequence on the other side. MS2 gate would consist of a strand that could turnoff MS2 in state 2, but is instead paired up with the FMN sequence instead, turning it off in state 1 while keeping the MS2 turned on.


In early rounds of the Riboswitch on a chip labs, a good part of the high scoring designs of the designs Exclusion labs 1,3 and 4, had a habit of having MS2 gates in the 1 state. I originally first took notice of them in the XOR type simulation puzzles and afterwards took note of them in the exclusion labs also. However in Riboswitch on a chip, Round 3, I noticed one design that took a different direction.


Exclusion 2 prediction

The main part of the Exclusion 2 lab designs had far more bases involved in the switching than the Exclusion 3 designs. Huge amount of bases involved in switching is something I generally think bad. Which was why I noticed Perushev’s design (88%) in round 3 in the Exclusion 2 lab, as it behaved different from the other high scorers. It behaved much more as exclusion 3 design - with a magnet segment targeting FMN1 - and that lab we could make lots of winners in already. However contrary to the Exclusion 3 pattern and general Exclusion 1,3 and 4 pattern, Perushev’s design did not target the FMN closest to the MS2, which some of the other labs had done. Also it had a minimal MS2 gate.


Here is my last round favorite by Parushev. I was predicting that it would do well over its higher scoring peers in then coming round 4.


Score 88%

http://eterna.cmu.edu/game/browse/5736148/?filter1_arg2=5764818&filter1_arg1=5764818&filter1=Id


And indeed this pattern with targeting the FMN1 is looking to take over the pool in the Exclusion NG 2 lab (round 1).


Furthermore the new Exclusion 4 round data had a winning mod of the Perushev design. :)

 

Score 97%, cluster count 29, fold change 21. 

http://www.eternagame.org/game/browse/5962962/?filter1=Id&filter1_arg1=5988519&filter1_arg2=5988519


Targeting opposite FMN to usual and not leading to MS2 gate.

The 4 designs that scores above 94% with cluster counts above 20 in Exclusion 2, round 4, all seems target FMN1 - the FMN sequence furthest away from the MS2 - leading to no MS2 gate forming. (Designs watched in Vienna 2 and for the two not folding in Vienna 2, both NUPACK and Vienna1 agree.)


Background puzzle and post

http://www.eternagame.org/web/puzzle/6042823/


https://getsatisfaction.com/eternagame/topics/switch-scores-for-eterna-switch-puzzles?topic-reply-list[settings][filter_by]=all&topic-reply-list[settings][reply_id]=15821750#reply_15821750




FMN 1 over FMN 2?

For the early exclusion labs, the MS2 turnoff sequence seemed to prefer to aim for whatever FMN that was closest to the MS2 and the MS2 turnoff. I have been interested in what FMN to aim for with the aptamer/(MS2 turnoff) sequence exactly because I saw a bias towards aiming for FMN1. Something I as a part of my last round 3 designs to try figure, to see if that would change.  


Background article

https://getsatisfaction.com/eternagame/topics/switch-scores-for-eterna-switch-puzzles?topic-reply-list[settings][filter_by]=all&topic-reply-list[settings][reply_id]=15821750#reply_15821750


In the lab Exclusion 3, round 3, designs thrive with having a MS2 gate. But for the round 4 there is perhaps some change. I compared the three designs with the highest score, decent cluster counts and fold change.


I managed to make a winner that seemingly targets the opposite FMN in round 4 to normal, by modifying Salish round 3 winning mod of Brourd. There is no MS2 gate forming.


Score  100%, Cluster count 29, Fold change 27.

http://www.eternagame.org/game/browse/5962963/?filter1=Id&filter1_arg1=5988507&filter1_arg2=5988507


Also notice that the two loops are rather even sized in both states. Something I find interesting. The designs with MS2 gates, cuts down on the loop thing.


Omei did a mod that got even better fold change. There is a small MS2 gate forming. Also I found it really interesting that the pyrimidine stretch continues after the needed MS2 turnoff pyrimidines.


Score 96%, Cluster count 20, Fold change 34

http://www.eternagame.org/game/browse/5962963/?filter1_arg2=6069248&filter1_arg1=6069248&filter1=Id


Though Omei and I got higher fold change with targeting the opposite FMN(2) to the usual FMN1, the old way of making Exclusion 3 winners by making an MS2 gate and targeting closest FMN to the MS2, still works.


Here is my other winning mod of Salish round 3, modding Brourd.


98%, Cluster count 65, Fold change 22 (For this design I trust NUPACK better as the other engines couldn’t show a stable fold - and the data suggests that it probably folded. :) )

http://www.eternagame.org/game/browse/5962963/?filter1_arg2=5966685&filter1_arg1=5966685&filter1=Id


3 of the 6 designs that scored higher than 94 and had a cluster count above 20, were aiming for the FMN2 - meaning avoiding making a MS2 gate. 2 designs were targeting FMN1 and making a FMN gate and 1 design I couldn’t determine what aimed for.


So I can’t for sure say that totally killing MS2 gates are always the better strategy as designs can get working with one. With this few designs it is still a note of curiosity. But a curious pointer. :)


The real interesting thing is that designs can get working without the MS2 gate, also in winning lab designs that earlier had them. I count on the coming NG round 2 lab data to tell us more on which is the better strategy.


For now the benefit looks like the winning prize still goes to primarily targeting the FMN1 sequence with the MS2 turnoff. But with the twist, that the MS2/FMN turnoff sequence should go for the FMN furthest away from the MS2, if it can avoid making an MS2 gate.


Which leads to the MS2 gate are going a bit more towards extinct or generally getting shorter. Well except for microRNA labs - where the MS2 gate have thrived like nowhere else I’m aware of. Also some Exclusion labs like Exclusion 3 will still like to have MS2 gates to some degree.



Energy engines and the MS2 gate


MS2 gates have been the hallmark of Exclusion labs from the beginning. However it has been starting to disappear also for one more reason than just by design. Something that speed it up more was the addition of two more energy engines, Vienna2 and NUPACK.


I’m starting to think that the designs mostly want is to be open to switch and not have a long MS2 gates if any - with microRNA labs as exceptance. I think MS2 gates were easy to use before as it was the fastest and simplest way of turning the aptamer off and turning the MS2 on - as they were real close in both sequence and space and complementary already. Also some of the designs I earlier thought to have an MS2 gate, probably had one due to how Vienna showed them.


That's is one interesting difference between Vienna and Vienna2: Many of the MS2 gates disappear when watching a design in Vienna 2, versus watching in Vienna.


Example from Exclusion NG 2, state 1 of the round 1 high scorer by Brourd.

Ms2 gatepng

http://www.eternagame.org/game/browse/5851780/?filter1=Id&filter1_arg1=5943087&filter1_arg2=5943087


When thinking of it, forming the MS2 gate carries an energy barrier burden. It takes energy to break. Add on top of that, that the MS2 gate gets formed in state 1, the state from which the switch has to move - the MS2 gate isn’t exactly helping the switch on its way - rather . Plus it mainly supports the MS2 in forming - which already have quite an easy time turning on, so, it probably don’t need help with that.


Another reason why I believe Vienna2 more accurate, is that Vienna “1” keeps bigger multiloops - I think multiloops can be too big can get too big. Multiloops and other kinds of loops tend to be bigger in the lower scoring designs, compared to the higher scoring ones.


Same design as above but with both states shown and FMN’s highlighted.


I think Brourd got this structural pattern right. (State 2) I see it begin take air in all the exclusion NG labs.


I think more often that Vienna2 is most reliable for showing multiloop size and FMN targeted. Note, NUPACK tend to show MS2 gates more often. It also more often shows turnoff sequences pairing up with both FMN1 and FMN2 bound up compared to the engines. Could be worth watching out for. NUPACK may end up being the more reliable on some things. I am starting to like it for some designs.


I have earlier mentioned that it is not always that NUPACK will show that a switch have taken place, by the two states being identical. And thus I have distrusted it. However Vienna2 doesn’t always show a legal fold for designs that our data suggest works just fine. Then I go check NUPACK and Vienna. So I think I will put my advice like this: If Vienna and Vienna 2 doesn’t make sense, trust NUPACK. And vice versa. :)



Sum up on MS2 gates


While I have been the first to yell MS2 gate for the Exclusion labs, I think we should now use them with a bit more caution.


I think the switch will be more flexible and get higher fold change if it instead is not relying on having a MS2 gate, especially if it is a long one. Simply them gone missing, should speed up things.


Again as with other patterns before it, I will not deem MS2 gates all out. I still see them working in some labs more than others and some designs better than others.


Rather I will see MS2 gates as yet another lever and a handle for us to change weight and balance, as the situation needs it. I think in some cases MS2 gates can be useful as adding as brakes on a reaction if one imagine a lab situation, where one needs things to slow a bit down in one direction of the switching. In other cases, avoiding making them, will speed things up.




Perspective


So now I wonder what goes on in the microRNA labs then? There both turn on and turnoff labs shows long MS2 gates, whereas only the Exclusion labs shows them among the FMN/MS2 labs.


In the microRNA labs, the fold change were high and the MS2 gate ginormous. And yet the long MS2 gates still works. I think it may have to do with the extra length of the microRNA added. That the power of hydrogen bonding, when first the microRNA has attached to the design, is enough to force the MS2 gate up on its own - so that it is not hurt by the extra strong binding MS2 gate. I wonder if this trend continues.

(Edited)
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Not sure if it's really missing, Eli, or just hidden (i.e. the model doesn't show it, but a slide along one braid could open it again - in reality)?
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
So, here are some graphs on the anslysis of relative base composition on scoring.
First, the simple C% graphs that we all loved so much from the first couple of labs.

Then, that got me thinking. How could we improve the predictive capabilities of the bases without shape data.
Here one of the analyses I ran, looking at the (A%+U%)/(G%+C%) relation.
Note that this does NOT equate with basepair relations, just base composition.
Here the result for the entire R95 set:


In order to make the power of this tool more visible, I have split the graphs up into smaller parts:

interesting here is this "tent shape", asymptotical convergence on a high score around a A+U/G+C value (henceforth designated "performance indicators") 1.74 at ESC 78.3. These are all improvements on similar RNA's that score well, but not impressively high - those are found as outliers at much smaller A+U/G+C performance indicators.
Let's take a look at the others


















As can be seen, the peaks in best performing switches vary between the different experiments. It may be that we have to focus on different performance indicators for different molecular aims (e.g. Turn-on vs turn-off, or specific to each experiment, but then the indicators are limited to the exact purpose of the molecule as well as fixed to the length of the RNA, concentration used etc etc, so the latter may be too limiting.

Let me know what you think of the analysis so far.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Salish, thx for the graphs! :)

I find it interesting that one can see the designs halted at 30% and 60% as lines through the data in many of the graphs. Quite a lot of designs getting stuck.

I find it interesting that Exclusion 2 and 3 looks so different. They have been very different in relation to ease of solving. Exclusion 3 were the easiest of solving.

Actually the trend is that the exact easiest lab gets the narrowest ratio.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
In each round, we see accumulation of many designs at the max values for either switching, folding, or binding, thus the 30 and 60 perc lines.

Interesting to me is this tent like structure that I tried to emphasize in the Ex1 results. While this does not lead to the max values, it does seem to indicate a method of optimizing the RNA structure either way to a max of 90ish perc.
Apparently following this RNA optimization method leads to better RNA, but not to 100 perc scores, so the perfect shape cannot be realized.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
This analysis snippet is now in the dropbox - google does not allow me to upload my files as shared documents-can't handle the graphs
https://www.dropbox.com/s/5rg32rye7byuv60/R95_results_edited_LR.xlsx?dl=0
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Below the other obvious comparisons of performance indicators to be made using the same approach:
(A%+U%)/(G%+C%),
(A%+U%)/(G%+U%), and
(G%+U%)/(G%+C%)




Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
And, for comparison, here the data analysis of the R97 results:



Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
So, I finally got around to combining the performance indicators with the kd values to see what works for which set of data, and have it all in one graph.
This is the result for the ((A%+U%)/(G%+C%))x(KdFMN/KdnoFMN) subset:
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
any comments welcome
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 966 Posts
  • 304 Reply Likes
Thank you for the graph, Lars.  At first sight, the various shapes looked really intriguing.  

But on closer inspection, it seems like the choice of the X variable is unusual; it is a combination of experimentally controlled variables (% of bases) and results variables (the Kds).  So the two "horns" for SS and EX labs are really just a result of the fact that a high Eterna Score requires a large fold change.  Fold change is defined as the log of KdOff/KdOn, so for a good SS lab, where KdOFF is KdNoFMN, the ratio KdFMN/KDNoMFN is going to be a small fraction and for EX, it will be a large number.

Can you think of a way to chart this that make it easier to see causal relationships?
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
on the last one, yes.
What frustrates me at the moment is that all graphs we have looked at so far correlate experimentally controlled variables to the results. So far, this yields only hugely wide spread predictive capabilities (i.e. must use G between 15-65% (just some random numbers here)).
What I am aiming at is to narrow this down.
E.g. G%+U%/G%+C% must be between 1.05 and 1.15.
This combines more of the data into fewer numbers, meaning more calculations and less graphs, and overall better quality predictions.
All in all, I favor data-crammed graphs to having to use dozens of individual graphs for showing the same thing. Granted, you only see the graphic representation of a data collection in the individual graphs, so they have their use, but that's for the individual more than for public consumption or publication.

What I aimed at in that last one is to use this predictive factor and combine it back into the experimental data, in order to emphasize (well, it doesn't so far) sturctures that work well. The kd/Kd is, as mentioned, the straight fold change variable, so naturally the SS and the Ex experiments are at opposite ends. Only now, the x indicator has been streched by the experimental input data. If we chose this 'stretching" correctly, good structure inducators should jump at us from the graphs.
I am still thinking of how to improve this.

Thanks for the comments, Omei, exactly what I was looking for.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
still my favourite graph in the series so far
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Omei and I have started to put our Round 97 lab analysis on a new page:

Round 97 Riboswitch Lab Discussion

Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Same State 2 - Round 3 + NG - GU’s

I may have mentioned this before, but there is something that I find very interesting in Same State 2 lab. That the winning designs in general have 1 GU base pair in the aptamer gate (second state), even if it is short.


Which is only sometimes the case in the 4 bp aptamer gates in the Exclusion labs (they have a forced MS2 mirror sequence demand in the aptamer gate, as half the aptamer gate is the MS2 itself, which is limiting where an eventual GU could turn up.


That aptamer gate in same state labs are typically the 3-5 base pairs. Against 3-4 in Exclusion. (With variations and longer length in the harder puzzles)


Example from Same State 2


100%, cluster count 196, fold change 29.91%, Brourd mod by Salish

http://www.eternagame.org/game/browse/5736176/?filter1_arg2=5742222&filter1=Id&filter1_arg1=5742222


And one from the close related Same State NG 2 lab


99%, cluster count 23, fold change 23.47, Vinnie mod by me

http://www.eternagame.org/game/browse/5851787/?filter1_arg2=5954731&filter1=Id&filter1_arg1=5954731


Even better, now I think I actually understand why this weak aptamer gate is allowed. For the Same State 2 lab, in state 2 it gets double boost from both MS2 molecule and aptamer molecule forming. So they will ensure that the aptamer gate holds. So I’m guessing that the GU is helping the RNA switch faster back to state 1 again, more than it is helping the aptamer gate hold.


Also in the case with Salish designs that have a long aptamer gate - 5 base pairs, the GU is probably helping the switch by destabilizing an aptamer gate that could otherwise be too stable, to be able to switch back to state 1 again. We want the switches working both ways.


The longer the aptamer gates gets and the more GC pairs in them, the harder it is to get a switch. Mostly it takes max one GC pair. Which is kind of forced in Exclusion by the MS2 sequence being half of the aptamer gate strand. But even Same state that has freer options, doesn’t generally take it. Only the harder designs like those having the switching going on in open end, actually involves more GC pairs at the aptamer gate.


Exclusion 4 does fine, there are designs scoring very close to being winners. Its high scorers doesn’t have a prolonged aptamer section. But in Exclusion 1, where we struggle to get scores higher than in the 80’es, the trend is towards longer aptamer gate regions.


I think this comes in relation to which side the MS2 turnoff sequence is at. If it is at the left side of the MS2 as in Exclusion 1 and 2, then the lab is harder. If it is at the right side of the MS2, like in Exclusion 3 and 4, then the lab is easier.


Related post:

Length of aptamer gates - depends on hardness of lab

Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Get the switching parts close together


Multiloop size


There seems to be an optimal size range for the multiloop holding the switching bits. At least when it comes to MS2.


I think this is what largely shows up in the score difference between the Same State 2 lab and the Same State NG2. The larger loops makes for bigger multiloops and the overall score of Same State 2 far outranks the Same state NG 2. Here are only 3 Same state NG 2 designs among the highest scoring, when setting a minimum 20 cluster count. Plus fold change is also better for Same State 2 in general.

https://www.google.com/fusiontables/DataSource?docid=1UZPCfxmv0f9-gnYv4Cfu7Ofz3EJcbI1rpQDD18h-#rows:id=1


To get the switches to work well, MS2 needs to be get put in a good range of the aptamer. But I think there is also an angle component to it. Something that Omei has also been mentioning.


In the Same State labs it isn’t about getting the MS2 closest possible range of the aptamer, as it is for the exclusion labs, rather getting it at certain distance (at both sides) for it to have both its end being able to pair up with both the aptamer gate sequence and/or part/s of the FMN sequence and in some cases some base hangouts after the aptamer gate.



Distance matters a lot for switching elements. I was discussing with Omei, who had come to same conclusion.


The closer one can get the switching elements in space, the better. Although there is an optimal range and like the Same state designs that protest if the MS2 gets next to or too close to the aptamer gate. (With Exclusion NG 1 as the least picky lab.)



Protein versus RNA

So here is what I think. Really RNA switches are just like proteins. In proteins, 3 different amino acids that collaborates on making a reaction, can be placed 3 different places far apart in sequence. But as long as the folding brings them together in space, things are fine. :)

As long as they are close in space, closeness in sequence isn't as important.

I also think this is really why the aptamer in RNA designs mostly benefits from being static and anchored in one end, at least to some degree. (Unless it has a very short short hairpin loop in the one end.) As by locking the one end of the aptamer, one ensures that the aptamer parts stays close in space, while they may not be close in sequence at the other end. So when forced close in space from one end, they are already ready for action when needed.


It is far harder to make a perfect landing two places at once, compared to one. Although in the harder switch labs, it seems okay to have the “static” end of the aptamer split and move a bit and even have a loop inserted 3 base pairs away (as Omei mentioned) - but I think this is under the condition that it is still tailored together further downstream.

I think that a full moving switch which often do not have the aptamer sequences close in space, all too easy burry the two FMN sequences somewhere by having them bound up to something else, so they easily never get moving and ready for a bind.


Also when I did try to lock the aptamer sequences in the Exclusion lab 5 - where the FMN and MS2 were too far apart by design (My doing :) ), I didn’t have much luck with it, despite I intentionally did try to securely lock up the one end of the aptamer by pairing the tails. When the FMN and MS2 are too far apart from each other in an exclusion lab, the switch needs to go full moving to get enough momentum to solve and then locking some part, is not going to help with the overall solving.

But not only do the aptamer sequence needs to be close in space, I also think the MS2 needs to be close to the FMN, to ensure it can do either a direct pairing with one of the two FMN sequences or use an intermediary partner or partners.

Or as Omei said: My suspicion is that those that do well, assuming their scores aren't just statistical error, are because they bring things close together in 3D that we don't appreciate in the 2D representation.

I agree. I think there is an angle thing to what distance is optimal between the switching elements. I think this is what determines what is the optimal range for closeness. This also means that the optimal distance may change should we change switch elements, as I think optimal distance may differ with element and what partner it is given. I think since MS2 has a rather strong personality, it needs to be fairly close, for it to get split. I think it doesn't like to stay split for very long if not given a good reason.

I still in general believe that switching elements needs to be very close to the ones they are supposed to switch with. Just angle and distance may vary a bit, depending on which elements are put together and which lab type it is. Exclusion and Same state have obviously different needs while having things in common too.

The yellow Salish hinge stretch with mostly A’s that typically end up in the multiloop in the switching area in at least one state - although sometimes as hairpin loop in another, is basically what I think has got its limit on length. If it gets too long, it stops being beneficial for the design. The MS2 gets in a bad positioning/angle to pair up with with the FMN sequences, either directly or indirectly. It creates too much distance in space. I think this is why we are scoring worse in the Same state NG 2 over the original Same state 2. The multiloops got way to big due to excess bases.

Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
I wouldn't make hinges more than 4 bases in length - at least the designs I tried with this approach didn't work well.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
I agree. I think for now that around 3-4 bases are the most optimal in exclusion labs. Perhaps with allowances to go longer for the Same state labs, like 3-6 bases. At least our Same State 2 lab lives fine with it. But perhaps it can get in range with Exclusion labs. Time will tell. 
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Even the 4 A hinges seem to stumble due to their breaking up working switch designs. See also https://getsatisfaction.com/eternagame/topics/fmn-ms2-riboswitch-structure for a discussion of the result of this investigation
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Hi Salish!


Recall where you originally found your hinges in the microRNA 208a lab. This was a single input lab.


I think they work just fine in the single input lab Sensor A MS2 ON and OFF. Especially in designs that follows the word change game pattern that popped up in Jandersonlee’s microRNA 208a winners.


However these hinges do seems partly dependent on the input complement sequence. But I believe some of the hinge at least is helpful as distance. And it should be beneficial that the first pairings between hinge and input complement are rather weak, as it make it easier for the switch to unfold when the input comes around.


Here are a few examples some are 3-4 A’s. One have another base in between. Notice that this input - TB A - just like the original mir 208a also is rather A filled, and thus calling for lines of U.


Score 100%

http://www.eternagame.org/game/browse/6296746/?filter1_arg1=6330006&filter1_arg2=6330006&filter1=Id


Score 100%

http://www.eternagame.org/game/browse/6296745/?filter1_arg2=6325389&filter1=Id&filter1_arg1=6325389


This one has a clean A hinge and scores 98%

http://www.eternagame.org/game/browse/6296745/?filter1_arg2=6325389&filter1=Id&filter1_arg1=6325389


A good deal of the absolute top scorers in that lab carries some kind of an A hinge between the static and the switching part of the design.




Sensor B is less clear cut. Here the hinge more act like spacer as the input complement doesn’t have U’s lined up for pairing and calling for A’s.  


96%

http://www.eternagame.org/game/browse/6296750/?filter1_arg1=6335041&filter1_arg2=6335041&filter1=Id

Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Trap that MS2



MS2 tends to need a double trap - to be captured from both ends for turnoff by two sets of complementary catch sequences.


How strong the individual MS2 trap needed to hold back/release MS2 needs to be, depends on the hardness of the lab structure and on which side the turnoff sequence is positioned in relation to the MS2. If the MS2 is not held open strongly enough, then MS2 is a lazy bugger and does not want to unfold if it can help it. In Same State labs, MS2 is at its favorite job. It just have to fold up with itself in State 2, which is the easiest job. So there we have to hold it back in state 1.


In Exclusion labs we have to work against the stream and put extra energy to unfold and keep open the MS2 in state 2. In Exclusion 2 several of the winners shows that they need some extra bases to hold tight the MS2, compared to Exclusion 3. Exclusion 2 was benefiting most from removing the MS2 gate in state 1 as to not have something extra to support the MS2 in wanting to opening up. In Exclusion 3 we could make winners either way, with and without MS2 gates.




Pressured designs, unconnected designs and normal easy designs



Pressured designs

I think switch lab designs can be pressured lab designs too, just like static labs. What makes a RNA switch pressured is having its switch elements placed too far apart in sequence - if there is no option of bringing them close in space, by adding static stems in the switching area. This goes particularly for Exclusion labs, where as Same State labs both gets stressed by having the MS2 too close or too far away from the FMN.


Unconnected designs - switching elements not connected in sequence

As I have mentioned, MS2 really likes to be tied up from both ends - preferable if it can be in a stretch of bases connected in between the FMN sequences, second best if it can be in a stretch of bases that is closed up otherwise. Either as an internal loop or a static stem added, so the MS2 isn't free to move except in a rather limited space close the the aptamer. The Exclusion 1 and 4 breaks that base connection. Those are the lab types I call unconnected.


Imaginary connection drawn in, with broken lines.

https//lh4googleusercontentcom/am4JAPkfZWFROWtMeMgiNaN1UhkfWqcW2Bly6vmGoHalkoTcDb9KVNLAyoQKcvfXKRsXc3pRUNPzCY7CfvfAC-oc4DPC6RNjfL3-XixeUt_Ms16i7drpmcqDRrURYrFIs1600


The NG labs have shown that the Exclusion 1 in particular and also 4 labs, just have been missing to knot their tails, as I was expecting.


In Exclusion and Same State NG1 and NG3, we are capable of making that connection partially, by the RNA tails being long enough to pair to make a static stem and still force MS2 in a position close by the aptamer.


As the fold changes shows so far, it is not nearly as effective as the original Same State 2 set up with having the the MS2 between the FMN sequences and then make a static stem of the extra bases. Despite we have showed that we can solve the NG 1 and 3 labs by knotting a static stem up of the RNA ends instead.


Although Same State NG1 do show an interesting trend of allowing MS2 to get much closer to the aptamer from the left hand side, compared to other Same State labs.


I think the angles somehow changed. I also think it makes a difference if the static stem is a hairpin loop stem or a tied up of end tails. And if the switch elements are connected in backbone or just by hydrogen bonds. Time will tell though. I know I will try mimic the Same State 2 set up a lot closer if there is a next NG lab round, and kick far more of the excess bases out in either tails or static stems so I can get the multiloops nice and perfect small enough.

Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
I looked a Eli's winning "Sensor v3, turn-off variant 1" design
http://www.eternagame.org/game/browse/5750152/?filter1_arg1=5819842&filter1_arg2=5819842&fil...

using the NUPACK multi-strand http://nupack.org/partition/new online tool as pointed to by Nando,and found it does seem to work in the model, though perhaps at higher concentrations than were tried in the lab:

sensor 3, v1 - 87 (Eli Fisker)
design UAUUAACAAGUAAGAUCCCACAUGAGGAUCACCCAUGUGCUCGUCUUAUAAGGUACUGUGACGAAAGUCACGGUACC
sensor AUAAGACGAGCAAAAAGCUUGU
1pM 0nM
1    20    1    38    0.9964000
1pM 100nM
1    20    1    38    0.9843819
1pM 200nM
1    20    1    38    0.9726543
1pM 1uM
1    20    1    38    0.8881355
1pM 10uM
1    20    1    38    0.4527750
1pM 100uM
1    20    1    38    0.0871351

20:38 is the A:U pair at the head of the MS2, so the midpoint of the turn-off is predicted at a bit less than 10uM.
(Edited)
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
I wonder if the AU is coincidence. We now found that in our endbits investigation, the final Pos 1:84 bases on a short stub end just ebhind an AU 2:83 pair did wonders in estimating scores by the nWCP/WCP chemistry - and here you have an AU heading the MS2.
Photo of whbob

whbob

  • 190 Posts
  • 57 Reply Likes
The most resent lab scores returned still report just a single score number ( from 0 to 100).
If a design solution had a score of 30, how does that relate to the three switch scores?  Would a score of 30 have to be a good baseline because there has to be a good baseline before a good fold score?  Is there any significance if there is a narrow horizontal row where 30, 60 and 90+ scores line up?
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 966 Posts
  • 304 Reply Likes
We're working on a new browser that will let you see much more in the way of details while staying in the Eterna UI.  But for now, Johan and players are filling in the gap with Google spreadsheets and fusion tables.  The general pattern is that Johan creates an initial spreadsheet with the measurements and stats derived from them, and other players, most notably Meechl, will extend that spreadsheet with more columns of interest.  Then, someone, typically Eli or I, will load those spreadsheets into fusion tables.  (Fusion tables make it easier to sort/filter/summarize/graph the data.)

The most up-to-date fusion table for R97 that I know about can be accessed with the URL http://tiny.cc/Eterna_R97_Fusion; that will be redirected as updates occur. Pointers to other player-generated contributions to lab analysis can be found in the Wiki at http://eternawiki.org/wiki/index.php5/Lab for labs in general and https://getsatisfaction.com/eternagame/topics/round-97-riboswitch-lab-discussion for R97 specifically.  As always, the Wiki is only complete as we players make it, so the lack of information there (e.g. R96) doesn't mean it doesn't exist.
Photo of whbob

whbob

  • 190 Posts
  • 57 Reply Likes
Thanks Omei.  I had missed MeechI's fusion table. It's great! Lot's to think about:)
Photo of Astromon

Astromon

  • 182 Posts
  • 23 Reply Likes
mEECHIES  fusion table is the best ever!!!!!
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Just wanted to leave a note about that Omei has opened the Round 98 discussion at this page, related to the new NG round 2 lab data we have recently got.

Link to Round 98 lab discussion

Link to Round 97 lab discussion