We have recently finished our pilot experiments with great initial success. Using a new technique that measures switching directly on a sequencing chip we directly observe the switching for thousands of designs at once. The signal is generated by a fluorescent RNA binding protein, MS2, and instead of the standard EteRNA score, which is based on the correct folding of each base, we have introduced a new Switch Score.
The Switch Score (0 - 100) has three components:
1) The Switch Subscore (0 - 40)
2) The Baseline Subscore (0 - 30)
3) The Folding Subscore (0 - 30)
The scoring scheme is summarized below. A more detailed description is given in this PDF:
A typical example of a switch puzzle is shown below:
The player designs the structures in [1*] and . To observe the switching we then measure the fluorescent signal of MS2, which binds specifically to the MS2 hairpin seen in . In the absence of FMN, the MS2 should bind and the switch is ON. On the other hand, if we introduce FMN, the ligand in [1*], the switch should be OFF and not exhibit fluorescence.
No switch is 100% ON or OFF in the absence or presence of ligand, but a good switch can come very close (and get a perfect EteRNA Switch Score!). A some MS2 concentration, the difference should be large (e.g., at ~100 nM MS2 in figure below). In practice, we don't know this concentration beforehand so instead we perform measurements at many concentrations to obtain binding curves. When the switch turns OFF (red curve), the effective dissociation constant increases. The dissociation constant, Kd, is the concentration where half of the RNA binds MS2.
The Switch Subscore quantifies how far apart the Kd's are in the absence and presence of FMN (horizontal distance between the red and blue curves).
The Baseline Subscore is a measure of how close the ON-state is to the the original MS2 hairpin (lower Kd is better, i.e., blue curve should be far to the left).
The Folding Subscore is high if MS2 bind properly in the ON-state at any concentration (the score should be high for the blue curve at high concentrations of MS2, i.e., high values to the right)
In our first experiments, we found that the easiest score to maximize is the Folding Subscore, followed by the Baseline Subscore. These two ensure that the MS2 hairpin is properly formed in the ON-state. The hard one is the Switch Subscore, which is the highest when the energy difference between the states is finely-tuned to the energy conferred by binding to FMN (or other future ligands).
While doing the color marking of the repeats in riboswitches - as mentioned in the above post - I noticed a peculiar pattern that appeared to be around in many of the switches. I noticed it because I was looking for patterns in the positions of the G segments compared to each other. A lot of the switches had at least 1, sometimes more of their G segments really close, whereas the C segments often also had some closeness, but not to same degree. And this pattern kept turning up. (Y)GGNNGG, (Y)GGNGG, to some degree also a reduced pattern (Y)GNNGG, (Y)GGNG and variations on that theme (Y signifying pYrimidine - meaning C and U bases). Regularly there is a pyrimidine in the end too. Plus at the beginning of the second G repeat.
Peculiar enough this G segment repeat pattern often appears to land in the switching area, and I can also find it in a number of the eterna switch winners, both FMN and TEP, although not all, though the pyrimidine start gets lost, due to locked FMN sequence. As this is where this two close double G repeats often turns up. 1 of the G repeats often being in stem and the other being in loop. Which reminds me of something else. A number of the G and C repeats are placed in loops and I think they are left there as to initiate the switching.
Later I ran the natural occurring riboswitches through Vienna RNA fold, to see if this sequence would land in high entropy area, and it regular do.
>Magnesium riboswitch mgtA: E. coli. Alteration: Normal.
The pseudoknots riboswitches mostly seems to be excepted from this sequence pattern. Perhaps they have another switching mechanism?
The two close G repeats often work this way where the one repeat will be embedded in stem and the other in loop area, in one state and then the repeat G’s in the loop helps as anchor for with the shifting to the other state. Similar to the twin G’s in FMN where the twin G’s are in the aptamer loop, but often gets bound up when state is shifted.
Now I think I finally understand, why there are the many G and C repeats in riboswitches. I think the C repeats help raise entropy as do U repeats. Plus when one make the Entropy of the design higher through repeat sequence and thus highly raises the probability that the design can fold into many other structures than the target structure(s) - one also needs to make the binding parts stronger = lots of GC = lots of G and C repeats.
I think one can balance the entropy by on a good level, by playing the right amount and types of repeats. I even think there is a different frequency of what kind of repeat there is. C and G repeat occur to a much higher degree, where normally A repeat and to some degree U repeat dominates in static puzzles. I think the ratio between the different base repeats matters.
Consider the following. It is Chimera's "ladder" rendition of the 3D structure of
a hairpin from the human 7SK snRNA in complex with arginine.
(I selected it just as an example of an RNA that does not seem to rely on changing its shape to perform its function.)
This model comes from NMR imaging, which is capable of "seeing" multiple configurations that the RNA takes on. (Unlike X-rat crystallography, which requires that the all the molecules be "frozen" into one configuration, so a crystal can form.)
All the configurations are superimposed here, and you can see that for the most part, the differences are small. (I've called out the one exception, where the uracil bulge will occasionally form a hydrogen bond with the uracil on the other side of the helix.
In contrast, consider the following NMR model, which is for a riboswitch (specifically, a preQ1 riboswitch in the bound state).
The image in the upper left shows all the configurations. Notice how much more variety (i.e. entropy) there is. In particular, there is a lot of switching of specific hydrogen bonds, while the overall structure remains essentially unchanged. The other three quadrants each show just one of the 21 states that are superimposed in the first quadrant.
What I think is happening is that this local variation in states (substates?) form a "broad energy valley" of states that increases the stability of the general shape more than the single minimum free energy value suggests.
But entropy is a two-edged sword. If there is a lot of possible variation that stabilizes each of the two desired states (e.g., in the Exclusion case, one and only one of the FMN and MS2 bound ), that is good. But variation that allows for neither of them to be bound would result in a "mushy" switch, which wouldn't get good Eterna scores.
One example (though not necessarily a great one) might be JL ENG3 1.03 in the Exclusion NG 3 lab, where forming the aptamer loop requires breaking one bond in the MS2 arm so that the closing pair can form for the loop. As further pairs break in the MS2 arm, more pairs can form in the stem holding the aptamer loop closed, until enough pairs have formed so that the rest of the arm is more stable in a different configuration.
The zipper style seemed to work well for the miRNA lab at least.
I have some more I have been thinking about and hope can benefit our MS2 designing.
Closeness of switch elements matters
While I think FMN and MS2 were too close together (next to each other or 1 base apart) in the first MS2 exclusion labs, I think they do need to be quite close - if not in sequence - then at least in 3D space. As several of better designs from the labs in second round specifically inserted a static stem between the FMN element that was longest away from the MS2 and the MS2. Also in microRNA labs, the MIR complement gets brought close to the MS2 sequence.
It seems to be a principle that to get a switch happen, the two elements that connects to either molecule or ligand, be it FMN, MS2 or mir, need to be brought in close range, and then some part of them will often pair up close to each other. Each one taking a turn to turn the other on or off. While in the Same State labs they both helped each other turn on or off, by pairing directly with each other for turn on or turn off.
But still the aptamer have quite specific wants for specific for its gate sequence and to so may MS2 also to some degree. (https://getsatisfaction.com/eternagam...)
In the MS2 player puzzles (computer simulation) it is my experience that MS2 do not really need to have an MS2 gate to solve. At least not with FMN around. It first do in the logic gates puzzles. I think the reason for that is that both mir sequences gets brought really close to the MS2 sequence. Actually they have a complement on either side of the MS2 sequence, That both works for turn on for the individual mir, but also works for turn off, when the mir complements pair up with each other in front of the MS2, prohibiting the mir sequences from pairing up with their complement. On top of that the left sequence after the MS2 is not only turn on for one of the mirs, plus turn off for both of the mirs, but also turn off sequence for the MS2. No wonder why this inverted XOR puzzle is a grumpy one and don't like changes.
Even the microRNA labs have quite specific wants for the MS2 closing doors. They seem to be somewhere halfway between complementary to parts of MS2 and complementary to the microRNA. I simply think there is a limited amount of legal solves, that can both turn on and off on its own (Not involving things like FMN) Ok, when involving part of microRNA - which is a changeable variant - the complement stretch between MS2 and MIR will have to change accordingly.
But generally the MS2 hairpin gates has not much allowed change - although more than the FMN aptamer. They are typically a bit longer than the aptamer gates .
I even had problems interchanging MS2 surrounding solves, from other lab puzzles, except if the MS2 sequence has a somewhat similar position in the puzzle. So for some puzzles there will only be few good variants of MS2 gates. This at seems to be the case for the logic gates puzzles - the inverted one in particular. But knowing when and why there may be few, could be kind of like a toolbox. And perhaps later something to teach the robot, if we find when and where it can with good success, reuse past good solves. Plus we may also learn something about good distance from the MS2 gate doors to the rest of the puzzle.
When I took a look at the switch archetypes drawings for the high scorers in the MS2 switch lab (https://getsatisfaction.com/eternagam...) I found that some pair ups were more likely to happen than others.
The early FMN sequence (FMN1) is more likely to be involved in the switch mechanism - bound when the aptamer is not in use, than (FMN2). And generally both ends of MS2 liked to be paired up somewhere, when the MS2 hairpin shouldn’t form. Rough numbers below.
Also note despite counting from the drawings, each lab design type don’t have an equal amount of solves. As they count pattern for the high scorer which may be a single design but also from the solve style among the majority of the winners.
While I say that MS2 likes to be close to one or both of the FMN sequences, I also think that different solving styles takes different distance. The complementary solving style seems to take a greater distance than magnet solving style.
First with last rounds coming results will we be able to see more about which solve types will dominate among the winners.
I have been talking a lot about MS2 gates. I think I can now say more about where and when they like to happen.
Background on MS2 gates
I have been working on the NG Same State labs first as I based on past experience think those are the ones we will have an easier time get working. However yesterday I started submitting designs for the Exclusion NG1 lab yesterday when a thought popped up. The designs I worked with seem to keep insisting on having a MS2 gate door. Something was not to the same extent the case for the NG Same State labs.
I think that MS2 gates are more needed in the Exclusion type lab - or with other words in the turnoff labs - where MS2 have to form in 1 state and get turned of in 2 state. I think MS2 gates will be less frequent in the turn on labs where MS2 should not be present in 1 state.
Now the distance or rather lack of same between aptamer sequence and MS2 was forced in the early exclusion labs, which means that the MS2 gate would often happen as means to get a sequence pair up with the one FMN sequence that was next to the MS2, which was a pretty good way of ensuring - 1: that MS2 formed, 2: that FMN didn’t form. But now we can choose the distance and the MS2 gate thing still very often happen. One way or the other.
MS2 and shield bases
Now I have been talking about MS2 gates forming for the turn off labs. But in several cases a MS2 gate doesn’t form at all. It can also regularly be either an internal loop or a multiloop, but with something added. What happens is rather shield bases that are put at both sides of the MS2. (I think the shielding base can also be helpful for turn on labs to avoid single base areas pairing up with each other.)
So contrary to the usual mainly A in single base area, what happens are base patterns that specifically shield each other from pairing up and becoming stem.
Where the bases doesn’t pair with each other, but rather ensure that the sequences just around the MS2 doesn’t pair up.
It can be some G bases on either side or more often it is U bases. U’s seems to work great for shielding.
I have also been talking about U base shielding for a while - but for static designs. Stretches of U’s spread in specific frequencies in bigger single base areas that prevent the single base area from pairing up with each other and forming stems. It was rather helpful for big hairpin loops.
Loop shield of U’s and nearest stem effect...
Shielding bases, sliding and the first switch lab
It also seems that having some few repeat U’s in single base area between sections that needs to get moving are great for creating slide and action.
Even other color bases spaced out by A’s seems to help on getting movement. I recall our first switch lab. The round 2 high scorer Tebowned (89%) by Mikestrange, had an odd pattern of spaced out G’s in single base area, which was unusual in good static designs for single base area.
Some of us did try to remove these G's and replace them with A’s - because it looked nicer. :) However mostly the design got very grumpy about it. :) As can be seen from the designs sorted after score, most of the high scorers retained the weird pattern.
Makes more sense now, that we know that much A’s and long repeats of them seems to be cardinal sin in switches, to a higher degree than in static designs.
MicroRNA and MS2 gate doors
Back to MS2 gate doors. Both the MS2 gate doors and the shielding behaviors is much more outspoken in the Turn off labs compared to the turn on labs.
But the microRNA labs have MS2 gate doors in both turn on and turn off labs. What is going on?
I think the fact that the microRNA (ligand) is not part of the design sequence itself is the reason for that.
I think the seemingly need for having a MS2 gate form or shielding bases around the MS2, is what makes the turnoff designs (MS2 forming in 1 state) harder than than the turnoff labs. (MS2 not formed in 1 state).
First, it became clear during designing submissions for the lab, that we needed a good "hinge", around which the designs could move freely between the states to score high. I tried around with what base that should be made of, it seemed obvious that we'd need at least 3, maybe 5 bases of the same type to do the trick.
It could not be U's , as we needed 5 U's in a row to satisfy the mirna coming along, and twice 6ish U's in a row struck me as strange.
My hypothesis was that it shouldn't be A;s, since the yield would be extremely low for these designs. Well, I was proven wrong, it was A's after all.
Let's take a look:
I analyzed the five best scorers:
and the one design of mine that sucked the most, for comparison.
The first three follow similar lines, and they all use the same hinge:
and No. 2
and No. 5
All excellent designs, that use the smart hinge trick. The A's indicated by the arrows pivot between the state, acting as the flexible connection between the main hairpin and the short hairpin in state 1, and as a separator piece in state two allowing the mirna to dock solidly to the unravelled hairpin.
Consequently, having such a pivot hinge is critical in allowing good attachment for this micro RNA.
Now, we come to one of the surprises, number four.
By adding a miRNA non-compatible A on position 29, as indicated by the arrow, the miRNA gets effectively split. Now, I have tried that in many positions, but splitting up the 24-28 row of abovementioned Uracils never resulted in a good design, so I tried to split it right after or right before this row of what I deemed critical connection points for the miRNA (the 5 U's). In this case, it worked marvellously. And not only that, the score is insanely high, which goes to show that the miRNA can be folded along a well-offered RNA connector, and does not have to line up as a straight one half of a hairpin. That's very good to know, as it offers switch designers more choice and variety in the making of the molecules.
So, now let's compare this with my design 009. At 60/100, one of the three criteria scored 0 points. While I don't know which, my hypothesis would be that the Kd value difference is low.
And one can immediately see, why it is such a bad design. The available stretch of similar bases is too short. Likely, the molecule sterically or isometrically hindered in switching between the states, which may may docking for the miRNA difficult, and thus results in low Kd scores. This problem is attenuated by having a solid wall of immovable RNA glue (GC) on the 4-9/16-21 stretch, which allows for no shifting in the molecule.
Your thoughts on this analysis are welcome.
Salish, you got me inspired. :)
I think what you are saying with the hinge thing is that the microRNA use it as hinge for docking. This made me think about that loose tail of single bases at the end of the design. This is kind of unusual with this many non A’s dangling and not doing anything in one state. Even in a static design. Usually they would go look for some action.
So the winners have this microRNA complementary dangle. Its usually a complement to the early stretch of the microRNA although it can be to the late part too. I see this dangling complement missing in some of the low scoring designs.
This happens not just in the 208a lab, but also the turnoff ones.
I simply think this dangling end works as landing spot for the microRNA and when first there, it can force the rest of the design open for full attachment.
Often either end of the microRNA attachment are rather weak. I noticed it was here that GU’s are more often welcome, than further in, from the mods of past round winner, where I added in GU's all over the design at turn.
Small mods - what is preferred?
Length of the tail
I have seen the dangling tail - that is not involved in the design when the microRNA is not present, be from anywhere between 4 and 12 bases long. Counting the dangeling bases that seems to pair up with the microRNA, when the microRNA is around.
Zipper 46 - 13 (100%) (Your No. 1 image)
This dangling even happens in the minority solve no 4 and its siblings. The number 4 you were wondering about is one of the rarer but wellworking minority solves that takes a different road.
Sensor 100 (99%) (Your No. 4)
I have a few pictures of the visual difference between them here:
Minority versus majority archetype
This minority still has its line of U’s that are complementary with the microRNA A’s. But the design itself is kind of reversed. It has its static stem in the opposite end of the design compared to the majority of the winners.
Dangling tail early or late
If the dangle is not in the one end of the design, it is in the other. I think there are most late dangles.
Sensor 3, v1 87 (100)%
I wonder if there is a pattern for the dangles. The minority ones seem to have the weaker and shorter miRNA complementary stretch dangle. I think this might be why there are sometimes minority and majority archetypes of solve types for the labs.
I also wonder if there is a pattern for which type of labs has the longest dangling tail. The 208a lab seems to have the longest dangles. Plus I managed to find some designs in the 208a lab which seemingly had the dangle pair up with itself. But again, these had pretty long dangles like 12 bases long.
Here is one of them:
Out of curiosity I ran MS2 sequence through Vienna and it seems to be in the high end of entropy for a static stem.
And appears to be less stable at both closing ends. Which isn't too bad, when it has to get moving. :)
1. A Series of Mutations
Player salish99 submitted a number of point mutations to uracil, for a sequence in round 95, in the same state 2 riboswitch puzzle. Only those residues unlocked or not originally uracil were mutated.
The following images are a visual representation of the Eterna scores for these mutations, mapped against the predicted secondary structure targets for both the OFF and ON states. It would probably be better to map other statistics, such as KDON and KDOFF to these images, however, the average number of clusters for this entire set was approximately 8 clusters, so the analysis following should probably be considered purely speculative and an example of the usefulness of this data with better statistics.
The color scale is set at 100-80 being green to yellow. 80-50 being yellow to red, and all values lower than 50 being red, except those values without any point mutation data, which are colored grey in this instance.
From a purely hypothetical and speculative perspective (again, the data for these would most likely not be the most robust), we can see that several point mutations to uracil caused a significant drop in score/activity for the riboswitch, including several single stranded regions that would appear to have no obvious effect on the riboswitch or change in secondary structure. In addition, the initial closing base pair of the FMN aptamer is important as well. Finally, in the first helix of the design, disruption of the G-C base pairs appears to also cause a significant change in score.
If we were to apply the M2 destabilizing mutations to some of the "winning" sequences every round, including those that are a part of the MS2 hairpin and FMN aptamer, information such as this would be incredibly useful for understanding the nature of the chip riboswitches, the robustness of these sequences to mutations, as well as a general understanding of RNA ensembles. Especially if the data was based on sequences with far more accurate and precise results.
Here is a link to the histograms, sequences and mutations used for this
I was watching my mods of the high scorers in round 2 of the MS2 labs. And I was wondering. Because a lot of my designs where I did make a higher score than in the original design, then removed themselves from my data set by getting cluster counts so low that I can likely not trust the data.
For the time being I’m primarily interested in cluster count at 20 and above and error rate below 1.4 as Omei set it in his new fusion table + plus Johan's warning that 10 clusters may not be enough.
Fusion Tables by Omei
Round 88 + 93
I noticed something else. More among my seemingly fine high scoring designs in the Same state labs had low cluster counts, compared to those in the Exclusion labs. This got me wondering about if there were differences between the labs and their cluster counts.
When I look at the data with Omei’s fusion tables from Round 93 and 95, this seems to be the case.
Cluster count versus lab type and score
Higher cluster count in Exclusion labs
Lower cluster count in Same State labs
The irony is that the labs which has the higher eterna scores in general, are also the ones with the lower amount of clusters.
Switch direction versus cluster count
Since the Same state labs differ from the Exclusion labs, by having different switch direction I used this for sorting, against cluster count. (The Same State labs have MS2 gone in first state, where Exclusion labs have MS2 forming in 1 state)
Exclusion labs left, Same state right
Exclusion labs left, Same state right
MicroRNA’s and cluster count
The microRNA labs also have higher cluster counts in the turnoff labs, but the trend is reversing.
Turnoff labs to the left, Sensor 208 to the right.
So why do some labs gets higher cluster count than others?
However having a high cluster count is absolutely no guarantee of a good score. Quite the contrary.
Now all this got me wondering if there were anything characterizing those designs that did end up with a high score.
I picked out the Same State 2 lab and the Exclusion 4 lab as those were the two labs in each category with the highest average score.
I checked through the ones that had higher cluster counts than 100 in the Same State 2 Lab. And what were true for those were that they were mostly or totally full moving switches. Whereas the main part of the designs in that lab that do end with a good cluster count and great eterna score are partial moving switches.
I checked the designs with above 300 clusters in the Exclusion 4 lab. There were some full moving switches, but they were a minority. Main part of the top scorers of this lab with decent cluster count also were partial switches.
Other differences between these two labs:
Exclusion 4 has the MS2 dangling on the “outside” of the design, where Same State 2 has it more embedded inside the sequence.
What else is different between these labs are that Exclusion 4 tend to have dangling tails, whereas Same State 2 prefers its tails paired with each other.
So basically I don’t know why the one type of lab (Exclusion - turnoff type) has more cluster counts than the other. (Same state). I just find it very interesting.
I've observed what seems to be a very strong pattern for constructing good FMN switches. It entails forming, in the unbound state, one (but only one) end of the (bound) aptamer interior loop.
There are two ends to the aptamer loop, which I've started calling the near half-aptamer and the far half-aptamer. The whole explanation seemed too long to merit posting in full here, so here's the link to the document.
I'll include the final graph from the document here. In the R95 Same State lab, after filtering the designs on other criteria that I've observed to be correlated with good switch scores, I got this:
Of the 65 designs that passed the other filters, the average score of the 37 designs that conformed to the near half-aptamer pattern was 91.6, compared to 70.7 for the 28 designs that didn't.
Generally most of the MS2 labs so far has primarily favored partial moving switches. However a hallmark for the top scorer in the new Exclusion 5 and Exclusion 6 are full moving switches or close to. These labs didn’t score too well.
Again, the aptamer seems to prefer to have one of its ends stabilized by a static stem. There are only few switches that have switching area around both sides of the aptamer.
I think this is one of the reasons why designs that have their MS2 sequence placed in between the FMN aptamer sequences, tend to have their tail bases pair up with each other and become static stem with each other. (Ex 2, Ex 3, SS2) As pairing the tails with each other, closes up the early end of the aptamer.
However for the designs that have the late end of their aptamer closed, the switching has to happen at the early end of the aptamer. I think what will happen will depend on how much sequence is before the aptamer sequence - the beginning and end of the RNA sequence. If there is little, I think dangling tails will happen. Leaving the switching to happen between the MS2 and one of the tails.
Exclusion 5 and 6
I would have tipped Exclusion 5 in particular, but Exclusion 6 too, to some extent to have ended up with their tail section paired up with each other. And I think in a better world they would have had it. But I’m guessing that the bigger distance between MS2 and the FMN sequence than the previous MS2 labs, may have caused the tail sections to be needed for being involved in the switch to make it happen.
Even in Exclusion 6 that only have 4 bases between FMN and MS2, those extra bases seems cause the majority of the top scorers to go full moving switch - which is bad for chances of getting a lot of winners. The same is the case for the Exclusion 5 lab. The exclusion labs that did best were Ex 2 and Ex 3, which had a distance of 0, and the Ex 1 and Ex 4 had one. I still think that perhaps something like 2-3 bases may work too.
I think the long distance between MS2 and FMN forces the design to go full moving switch, and this makes the tail too weak to stick together as a static stem, as would normally have been a good solution.
The Exclusion 5 lab makes me think that I have been wrong about distance between the MS2 and the FMN. I have been saying it can be too small. While having bigger distance working well for the Same State labs, Same state labs are different from Exclusion labs in a fundamental way.
It is as if the exclusion designs can better tolerate distance between MS2 and FMN, if the MS2 is not between the FMN sequences. On the contrary those labs that have the MS2 “outside” the FMN sequences - that is before or after and not in between the two FMN sequences, tends to wants their tail dangle. (Ex1, Ex4, Ex6, SS1)
The cutting line seems to lay somewhere between Brourd’s mod of exclusion 4 and Exclusion 6. The latter has slightly shorter tails and it can’t keep its tail together, whereas the one with slightly longer tail, can.
Dangling tails tend to go mainly C and U to avoid them pairing up. Typically with main part of the U’s in the first tail and more C’s in the last tail. Thats if the tails are not involved in the switching.
Exclusion versus Same State
Exclusion labs seems to want to have the MS2 real close to one of the aptamer sequences. As the labs where more distance were forced Exclusion 5 and 6, didn’t score too well and the majority of the high scorers became full moving switches. I had actually thought that having the MS2 and FMN close were a hindering for higher scores, but it seems that making distance large (4 bases and up) forces a much more severe shift in the design, whereas designs with the FMN and MS2 fairly close, can have an overall stable structure and only switch in a small area. There it will be more of the switching parts moving instead of the full design moving. So if the MS2 and FMN cannot slide, the rest of the design will.
Designs that have MS2 in the 1 state and FMN in state 2, plus have some distance between them, seems to need a fuller moving switching pattern than designs of this type that keeps them real close. As a result they will also have higher entropy - due to the fuller move.
Where Same State lab types actually prefers some distance between MS2 and FMN. All the difference between these lab types is if MS2 is already on in state 1 and needs to get turned off (Exclusion/turnoff labs) or needs to get turned on in state 2 (Same state, Sensor 208a lab type.)
I was wrong about the distance thing I earlier said for the exclusion labs. They don't seem to like much bigger distance between MS2 and FMN. I think what really matters, is that weather MS2 is present in 1 or 2 state affects things a lot. Any lab that has MS2 present in 1 state (and FMN in the opposite state), will be harder to solve (MS2 turnoff labs) than those which have MS2 be present in state 2 (MS2 turn on labs). And if big distance is forced in exclusion labs, between FMN and MS2, it causes the lab to go full moving switch which will make the designs even harder to solve.
I think the exclusion/turnoff labs can’t have as much space between FMN and MS2 because they need will often need MS2 gates.
For a design to turn on MS2, it just need some complementary stretches somewhere. And the complementary stretches can easily jump a bit. Hence the bigger distance between FMN and MS2 sequence in the Same State 1 and Same state 2 (MS2 turn on labs) However when the MS2 needs to get turned off, the turnoff sequence needs to be close by the MS2. (Exclusion, turnoff labs) Plus when there is a need for a turnoff sequence, this often calls for an additional turn on sequence too. (MS2 gates). These are close to the MS2, just like aptamer gates are close to the aptamer.
So I think that what distance needed of MS2 and FMN, are to a great extent called for by switching direction.
I have been talking about MS2 turnoff sequences earlier. They seems to be using same method of operating in the turnoff labs. Working in concert with the MS2 gate.
In the turnoff labs (Exclusion type), MS2 is on and formed in State 1 and needs to get turned on in State 2.
MS2 is particular fond of having its turnoff sequence after itself or in front. It depends of the position of the FMN. This happens when a FMN sequence is close in front of the MS2 sequence. However when the FMN is close after the MS2 sequence, the MS2 turnoff sequence lands before the MS2 sequence.
Usually this turnoff sequence lands right next to the MS2 sequence. It typically consists of 4-6 bases, although it can in rare cases be shorter or longer. These 4-6 bases are typically complementary to a stretch inside of the MS2. Most of the time it contains an overweight of C’s and U’s. Also what I have earlier called a strong CU magnet segment - although these do not always need to be right next to the MS2 sequence.
Image examples with MS2 turnoff
What is quite interesting here is that the Sensor v3, variant 2 lab, that does not have an aptamer, has a kind of pseudo FMN sequence in front of its MS2 sequence, so it gets similarities to the Ex 3 and Ex 4 labs.
One of the exclusion labs that stands out from the MS2 turnoff, is Brourd mod of Exclusion 4. In that lab most of the top scorers doesn’t use a long turnoff sequence for the MS2, neither makes a MS2 Gate. Instead they tend to solve in a style much like some of the Zipper complementary style of the turn on labs like Same State 2. Which I find interesting. I look forward to see if this pattern shows a way of escaping the more fixed pattern of MS2 gates and turnoff sequences.
I don't have a quantitative analysis, but from looking at the higher scoring submissions from the last round it looks like most of them have cleaner dot plots thanI would have expected for a switch. Given Eli's thoughts on entropy and switches I found this surprising.
This seems to go with along Nando's idea for ViennaUCT (assuming I interpreted it correctly) that non-switching pairs should be 100% bound, OFF-only pairs about 96% bound, ON-only pairs about 4% bound, and always unbound NTs 100% unbound. Perhaps it is one reason why ViennaUCT has been doing so well in the switch labs.
For myself at least I plan to pay more attention to the dot plot this round and extra time designing the "fixed" stacks to prevent mismatches.
Anyone else noticed his (or the contrary)?
The same trend as for many earlier switch winners, with a raise and a dip at early point (left side of the melt plot), still holds for many good switches. (Its a trend but not an always, there are switch winners without.)
However there are a few new trends I thought worth a mention. Here is an alternative slope that I have seen regularly for switch winners. So when I see this in one of my coming design, I count it a good thing, while its absence won't necessarily make me dump the design either:
The microRNA's lab winners seems to have their own melt pattern going. I think it is related to the microRNA unzipping the rather long MS2 gate and I'm guessing it is in a hurry. :)
Just a reminder for those who are new to this. A flat beginning is usually good also, even if there are not changes between the states.
A few points of wondering
- Are there any kind of difference in melt plots between turn on and turnoff labs, when it comes to plots. Exclusion labs more messy?
- I wonder if the alternative raise is related to MS2 Gates? If so this is related to exclusion labs. Oh - moment of realization... This is what makes the melt plot in the microRNA labs, look so different - the MS2 gate. For now I deem MS2 gate as culprit. :) This is what causes this alternative melt plot look. This also explains why they tend to show up in the exclusion type of labs and why I like them in my designs. Because I also like MS2 gates. :)
This melt plot drop, that were once quite rare, turned out to be specific to switches:
Some Exclusion 4 Observations Based on Salish99's Modifications of Ex4: 344
For the sake of brevity, this post will focus on the highlights.
Player Salish99 once again used a high scoring design and make several strategic mutations to the sequence. The WT (Wild Type) sequence and histograms are available here:
And this is the histogram for the sequence duplicate that Salish submitted in round 95:
So, from this, the first observation I would like to point out is the distribution of clusters in each of these histograms. The ensemble of clusters shifts without any significant outliers, the slope peaks near the median and the KD's for both the no FMN condition and 0.2 mM condition are similar.
From this, Salish used a variant of the wild-type sequence and made a C16U mutation to alter the next nearest base pair of the FMN aptamer to a U-G base pair. From this altered sequence, Salish made several point mutations to the apical loop of the helix following the FMN aptamer, for both the WT variant and the C16U mutant.
The numerical data for this is available in this Google spreadsheet.
Here are the histograms for the A23U mutation as an example.
For the G-U mutants, this histogram and the histograms of each of the systematic mutations indicate that the dissociation constant is lower in the no FMN condition, that Fmax is typically lower in the 0.2 mM FMN condition, and that the distribution and range of the clusters is typically quite high.
In contrast, the WT variant shows a higher kd in the no FMN condition, Fmax for the 0.2 mM FMN condition is typically higher than that of the G-U mutant, and the distribution and range of the clusters is tighter together.
From this data, we formulate the hypothesis that during sequence design, aspects such as the next nearest neighbor of the closing base pairs of the FMN aptamer will have some effects on the final result we see. In this case, a U-G base pair as the next base pair after the locked base pair of the FMN aptamer caused a change in both the dissociation of the MS2 coat protein and its aptamer, and the Fmax cluster intensity in the 0.2 mM FMN condition.
Granted, this is based on a single secondary structure and sequence. It's possible these effects can be mitigated with a different strategy or secondary structure.
I have been wondering about the Exclusion 2 lab, since I made this drawing of the MS2 turnoff sequences.
It stuck outside the switch off patterns for the Exclusion 1, 3 and 4 labs. In these labs, many of the high scoring designs use a magnet segment of C’s and U’s to switch off the MS2, by attaching to the G section in it, and to help the MS2 turn on again, by pairing with the G’s in the FMN just on the other side of the MS2 sequence, FMN in it.
Image of such typical design:
Notice that the aptamer sequence closest to the MS2 design (before the MS2 hairpin) is what the MS2 turnoff sequence (after the MS2 hairpin) pairs up with in state 1 (left)
Exclusion Lab 2 and the strange design
However in the Exclusion 2 lab, the main part of the high scorers followed a totally other pattern, with the main part of the MS2 sequence pairing up for turnoff and not just a section of it. And there were no involvement of the FMN sequences. In other words a more complicated solution.
However were one design, made by Parushev, that behaved different from the others. It did had the MS2 pair up with one FMN sequence. But it was not the one closest by as was the case with the other of the first 4 exclusion labs. And there were hardly any MS2 gate - just one extra base pair in front of the MS2. This design kept me wondering.
ChP 11-04-2015 #5 (88%) by parushev
In distance of 3D the distance to each FMN sequence, from the MS2 C’s, is around the same, since there is a small static stem abbreviation (left 4 bp stem) However by choosing the far off and early FMN sequence for a pair, the situation around the MS2 gets a little less locked. I think this is important. This may open up to further options for solve. So while this design is in the minority of the highscorers for this lab, I think that it has potential to become more prevalent in next round.
Basically this specific MS2 turnoff sequence for the above design is an abbreviated MS2 hairpin sequence - with most the strongest segments included. G’s and C’s.
So it seems that we have a way to avoid creating MS2 gate, just by choice of which of the two FMN sequences we target. The closest by in sequence/3D space or the furthest away.
So now I wonder what is actually working best? Creating exclusion designs where the FMN closest to the MS2 sequence also pairs up with the MS2 sequence, or creating more free designs that has the MS2 turnoff pair up with the furthest away FMN?
My past lab drawings of switching tendencies showed both types of switches, both ones where the MS2 turnoff sequence went for the one or the other of the FMN sequences.
For now closest to MS2 seems to have won. However it also put quite a lock on what possible solutions there can be made. So I wonder if the other may have a say again. At least I will use a good deal of my exclusion slots to swap FMN targets in high scoring designs, to try figuring that out.
Turning off MS2: https://getsatisfaction.com/eternagam...
1) Could we either unpromote all answers (and, yes, they are actually all very important to this thread, but the page has become unloadable when post and promotional report both need to be loaded and scrolled through)
could we not break this getsat post in multiple pages? i.e. there are now three, and soon to be four, and on each page the promoted ones show up first, I lose track of where the actual posts begin.
(sorry for all people promoted, you all still deserve it!)
2) The labs have numbers in the spreadsheets, I remember 89 and 95.
But we now had round 2, round 3, round 3.2, now HSa results, round 4, etc etc, and in the results section they still say "DNA template ordered", when the actual results are already out. Could somebody make a Round 1-4 plus small intermediary rounds - to - eternalabnumber conversion table? Thx.
A way to improve the exclusion labs?
As of now, the exclusion labs uses the aptamer for turn on in state 2, against having the MS2 on in state 1. MS2 is far stronger than the aptamer.
I wonder if not the exclusion labs would fare better, if each their states were reversed.As for the same state labs I don’t think a reversal would improve things, rather the contrary. As is, both the aptamer and the MS2 gets turned on in the second state. So there is a pretty strong pull towards turning on and getting the switch moving.
I noticed that my designs in the lab Exclusion NG 2, had gotten an unreasonable high amount of clusters, compared to the general average of clusters for designs in that lab.
What were different about them, besides that many of them had the MS2 turnoff sequence aim for the last bit of the FMN, were that they had an overhang of C and U bases. Which reminded me, was exactly the case for the microRNA labs, which didn't exactly suffer from bad cluster counts also.
One of my designs (83%) and cluster count 221
So now I wonder if any kind of overhang will cause higher cluster counts or it is mainly C and U ones. I'm guessing at the latter.
These unpaired dangling C and U stretches in one or both states are kind of a hallmark of switches, they are not just isolated to the microRNA labs. I have earlier found that in particular C and U overhangs were in overweight in high scoring switches as the switching strand. Also in past switch labs. Not in all high scorers, but in a great deal.
I have done more extensive notes on the cluster counts in relation to the NG labs here:
Multiloops for the switch
I have earlier mentioned that I thought switches benefited from multiloops in either of the states or in both at once.
“It could look like designs with multi loops have a slight advantage.”
Now there are a good deal more switch data to base that on and multiloops are hugely present in the MS2 labs where we have had best luck.
There seems to be something about having a multiloop in each state, that helps the design escape from making a step on the road towards a full moving switch.Here is one of the reasons I hate full moving switches. Unless the switching elements are real far apart, the design doesn't really need to be a full moving switch. The partly overlapping multiloops helps keep the switching elements in close range of where they are needed.
Also notice that the designs in lab rounds Ex 5 and Ex 6 mostly have no multiloops in either of their states. And they are also mostly full mowing switches. Whereas anything that has tails long enough to tie up (Ex 3, Ex 4B, SS2), often has multiloops in both states, contrary to Ex1, Ex4 that had really short tails.
Just as Omei mentioned there were something about presence of hairpin loops that prevents snake design. As he said:
"The requirement that there are at least two hairpin loops in both states eliminates “snake”
designs, which seem to be too rigid to switch well."
I honestly think that is a strategy on its own. ;)
A Structure Design Pattern for Eterna MS2 Riboswitches
Ways to get multiloops happen
To get multiloops happening in the switching area for sure in both states:
It is generally a help tying up the tails at each end of the RNA sequence with each other.
Add a static stem in the switching area.
Those two advice above often ends with the same result - namely tied tails, which often is the static stem added. It isn’t always a small static hairpin loop stem that gets added in the switching area, no, a static neck when involved in the switching area, counts too.
I am inclined towards saying that tying tails should be an always. Although I know we would get a hard job with the Ex 1 and Ex 4, due to their short tails. And I have seen riboswitches that switch at their ends. However these do typically not involve multiloops.At the bottom of this post, I have a link to a collection of natural riboswitches, as shown in their ON state. While not all of them have a multiloop, many of them do. Especially the bigger ones. And the ones that have multiloops are the ones I suspect are switching inside of themselves. Those without, seems to be switching with their tail regions. But this is of cause guessing for my part.
Pattern sum up
Pattern: Multiloops present in both states.
Where? The multiloops should be present in the switching area in each state.
What do they do? The multiloops helps steer the switch, by keeping a part of the multiloops stable, while allowing the rest of the stems in the multiloop to move. Having 1 or 2 static stems in each of the multiloops, gives the design a stable scaffold to do the switching from and also helps keep the switching confined to a very small area.
What does it look like? Two multiloops in either state, with one static stem as overlap between the multiloops and with some of the other stems as switching parts.
How: Make multiloops at the switching end of the aptamer, with the static stem being at the side of the aptamer gate which is furthest and most away from the MS2 hairpin. The aptamer gate will be one of the multiloop stems in one of the multiloops and the MS2 hairpin being another multiloop stem. Each state should minimum have two static stems.
- Demonstration of pattern in my winner from this round based on one of Vinnies designs.
Fiskers NG3ss - 39 Score: 96.1%
- Connection with other patterns: I think the multiloops owes a big part of their magic to the static stem as a scaffold for holding the switch - along with the static end of the aptamer. Also the multiloops helps bring the switching elements into close range.
Thx to Omei for showing me a way to nicely sum up what I think a pattern does.
Multiloops in switches
Rfam riboswitch picture archive - of bound shape
On Static Stems = Stability for Switching
I have earlier been wondering about if that static hairpin loop stem which seemed to turn up in many of the lab high scorers in the switching area.
I’m really starting to think that this static stem has a function, other than just pack single bases away out of misfolding harm's way. I think it serve a stabilizing purpose. Because it tend to turn up in the same spot. Right after or close after one of the aptamer gates
There seems to be something about multiloops too that seems to be helpful to the switching, since multiloops turn up in both states in a huge number of the switch labs now. Perhaps not because the multiloop itself is a help alone, but perhaps more that in the most successful labs (SS2 and Ex3), the switch has two static stems - the one stabilized end of the aptamer and one static stem in the switching area.
I think this static stem work in concert with the multiloop in the switching area. I think the multiloop together with the static stem, allows the structure to be overall stable enough to support a switch. The static stem being the stable and unmoving part in the multiloop, while the two other switching stems - the aptamer gate and the MS2 hairpin - have a stable foothold to swap around between them (and an eventual turnoff/turn on sequence too). With the static end of the aptamer holding the whole switching piece of art in a stretched arm.
Normal amount of static stems
It seems like the most of the high scoring designs like to have at least two static stems in state 1 and two in the 2.
So preferable both states should having 2 static stems in each state - one static stem at the non moving end of the aptamer and 1 static stem involved in the multiloop in the switching area. There are some lab designs with more than 2 static stems, but I think having 2 static stems is the basis for a good design.
For now Same State 1, Exclusion 1 and Exclusion 4 this is generally don’t have 2 static stems in each state and therefore they don’t have multiloops. But I bet they want it.
In the Same State 1 lab, this is what one of the current top scorer looks like. It doesn’t have the 2 static stems, in each state, which I think is beneficial to get the structure stabilized and switching. So i think it need one more. So here is what I predict that a lot of the winners are going to look like if we are getting one more SS1 lab. (Green pen drawing)
Pink = MS2
Light blue = static stems
Orange = FMN sequences
Yellow = Salish hinge
Grey sequence - the aptamer gate furthest away from the MS2 sequence
Green and red - switch magnet segments.
Perspective: Where to put the static stem?
I think I can say where there needs to be a static stem. Based on which aptamer gate is furthest from the MS2 sequence.
From how the grey MS2 mirror sequence is placed in the aptamer gate, the grey aptamer gate needs to be furthest away from the MS2 sequence. Which leaves the static stem to land at tail ends. Which is what I prefer it in any case. That makes the tails knot up. :)
So an added static stem doesn’t necessarily need to be a hairpin with loop. It can just as well be a neck/two end tails tied together also. What determines where it needs to be is the position of the aptamer in relation to the neck of the puzzle.
For Exclusion 1 I think the grey sequence should be after FMN2 and then a static stem - which there is too few bases for in the sequence.
Similar for Exclusion 4 I think the grey sequence should be before FMN1 and the static stem before the grey area.
When I had drawn it I realized that topscorers in the lab Brourd’s mod of Exclusion 4 already fits under this. The highest scoring design (when counting 20+ clusters and rerun in round 96) score 87%) follows a close to similar pattern.
Pattern Sum Up
Pattern: Static stem in switching area
Where it happens: FMN/MS2 switches benefit from having a static stem in the switching area at a particular spot. Which is after the aptamer gate that is furthest away from the MS2 or furthest away from MS2 interaction.
What problem does this pattern address? The static stem packs away unpaired bases, which is an advantage since our switches have a general dislike of longer stretches of unpaired bases. The static stem is helping stabilizing the multiloop in the switching area, so the MS2 hairpin and the aptamer gate gets the stability needed to interact either directly or with intermediate sequences to turn on or off.
What does it look like? The static stem is like a regular hairpin loop or neck made of the tail sequences paired. It can have different lengths.
How: Make static stem minimum 3 basepairs long and preferable 4.
Demonstration of same static stem attachein multiloop shared between the states:
- Which patterns are this pattern related to? The static stem is causing a multiloop to form in the switching area. The static stem is working in concert with the static end of the aptamer, holding the RNA design in position so that interaction between MS2 and FMN and intermediate sequences can take place. The static stem together with the multiloops, helps bringing the switching elements in close range.
Thx to Omei for showing me a way to sum up easily what I think is going on.
Multiloops for the switch
When to make tails pair
To make it easier for interested players to look into how the same design scores in different rounds, I merged fusion tables from Rounds 95 and 96 for sequences that were synthesized in both rounds. It turns out that there were 600 such sequences, 23 more than the 577 that johana purposely created. The resulting table is here.
I haven't tried to analyze the table in any depth, but here is one interesting graph that gives a feel for the data.
There's obviously a very high correlation between the scores, but it isn't perfect. In the worst case, the Same State 2 design Rediin score differed by 23 points between the two rounds.
Caveat: In the merged table, there are generally two columns for every column name, one from each round. The certain way to tell them apart is that whenever you are presented with a list of columns to choose from, the R96 columns all come before the R95 ones. For the handful of fields of most interest, I manually changed the R95 names by prepending "Previous" to the R95 data. If you are going to analyze data from other columns, I recommend you make a copy of the table and make the names unambiguous for the columns you care about.