We are very excited about the results, which have been posted in Eterna and also summarized in a PDF (https://drive.google.com/file/d/0B_N0OA9NROPGeURwMzctVk9GLTQ/view?usp=sharing), and Google Doc (https://docs.google.com/spreadsheets/d/1Ut_h4cFSv6lq92omYzlh7EDYHRq6cdxk4ML5FwRWrks/edit?usp=sharing).
Looking forward to hear your thoughts!
In the results we have generated the experimental data and the fits, using the Hill equation.
Since we have few clusters, which results in a larger spread of fit values due to poor statistics. As a consequence, many of the high-scoring designs are probably due to having only one or a few clusters. The blue points in the plots represent the cooperativity puzzle.
Designs with high amount of clusters, again generally have an insane amount of U repeats and often very long ones. This is the same for the other MS2 labs too and designs with very high cluster counts.
However in the other MS2 labs, the winners - also those with a somewhat decent cluster count on at least 20-30 clusters, seems to actively dislike having longer U repeats and lots of them.
So somehow long and multiple repeat U's help cluster counts - but it doesn't seem to improve the amount of actual winners.
Repeats of repeats
One more thing stands out in the lower scoring designs. They have a high amount of repeat bases in general.
Repeats of repeats:
Most of the designs had very low cluster counts. So I put a cut limit on 10 clusters on what I wanted to take serious. Just like I had for the first MS2 lab results. With the intention of raising that number when the future data will allow it.
For most of my designs, I have had the MS2 sequences somehow to be in contact with each other in the OFF state, either for short or long stretches. I had mostly not build in strong preventions against MS2/MS2 interaction. What I had hoped was to use the identical sequences or parts of them for turnoff/brake.
I sorted the data after score and a cluster count of 10. The two designs that got a somewhat decent score were both by Brourd. What they both have in common, is locking up the MS2 aptamer far from each other.
ThBP B (86%)
ThBP C (84%)
Omei’s mod of one of my riboswitches that did not do too bad, also had the MS2’s a bit locked up away from each other. While still leaving complementarity for a MS2 turnoff with the MS2 sequences themselves.
What I was often attempting was both switching, around the individual MS2 switch, but also switching between domains.
But overall these lab results got me thinking as this was not what I had expected. I had kind of expected cooperativity between MS2’s to be like a game of domino’s. I assumed that the domino effect of making one MS2 turn on and to get it to spill to the other, had to do with some closeness in space. So that change in one MS2 binding, would directly touch and affect binding in the other MS2. So a kind of more push interaction. Not a seemingly more detached independently binding units.
This all got me wondering about what a real natural double riboswitches would look like.
Natural occurring tandem riboswitches
I was reading about the glycine riboswitch that Johan mentioned in his sum up on the first round of Cooperative switches. Results for Eterna R96: Cooperative Binding - double MS2
So I decided to look that Glycine Riboswitch up and found an image.
Another interesting thing is that each switching area in the glycine riboswitch each contains what appears to be a static stem stem. (P2) :)
Each of the glycine riboswitches also have multiloops, despite these kind of aptamer only involves one switching element and not two like our FMN/MS2 ones. I see multiloops turn up in many other riboswitches with a single ligand, as well.
I read a few papers on the Glycine riboswitch and I found this bit interesting. The aptamers being more separate and them operating cooperatively none the less.
I particularly found interesting that the glycine riboswitch functioning as a digital sensor. Or explained like this: A light switch, where 1 = light 0 = darkness. So extreme and big changes, rather than just turning light up a bit or down a bit. So I’m guessing the other aptamer types can be called analog. :)
Advice for next round
Now we have very few winners and still shaky data for round 1, so take this as only pointers towards what I think will work.
Make separate MS2 riboswitch units
However this bit I am very certain about. When you make two separate RNA sections - here two riboswitch sections holding each their MS2 - you should:
Make sure that both MS2 riboswitch units vary from each other in structure (can be either stem or unpaired base region)
Make sure that both MS2 riboswitch units vary in sequence
So hereby the designing tip is passed on.
Expanding a bit on Brourd's ThBP B submission, compared with its variations: Taken together, ThBP B, ThBP C and ThBP D (which differed only in a few mutations that didn't change the structure) were represented by a total of 30 clusters, and they had an average score of 84. On the other hand, ThBP A (15 clusters) had an unpaired strand in place of the static hairpin, and it had a score of only 70. This suggests that binding up long strings of unpaired bases into static hairpins may be beneficial, just as it has shown to be beneficial for the switches.
Since my adventure started in the cooperativity riboswitch department, here is where I will put it up. Thx to Omei for discussion.
Switch Elements - the bases of switching
While I’m aware that it is very well possible to create aptamers to pretty much anything via lab evolution, I have started wonder if there is some sequence pattern to the natural riboswitches.
I keep seeing a lot of the same sequences come up. Both in our switches but lately also in the natural occurring ones and those that are engineered. I keep seeing exactly the same 2 kind of sequences pop up. Stretches of purine (G and A bases) and pyrimidine (C and U bases) particularly in the switching area, but also to some extent outside of it and in frequencies that are not normal - at least not for static designs - I have been scolding them quite seriously. :) And I even keep seeing these purine/pyrimidine it turn up at similar spots. As such I find it very interesting, as it suggest that we might be able to use it rationally to make better switches.
RNA is a game of frequencies - and it is about balancing those frequencies against each other. Too much of one, two frequencies or more frequencies - however good it/they are on their own and it will cause misfolds.
However different types of RNA have their own distinct frequency patterns. Like RNA with short stems - needs high GC base pair frequency and RNA with long designs prefer a more AU heavy solution.
For static designs a normal pyrimidine and purine pattern is that purines primarily turn up in the loop regions in static design, and pyrimidines tend to turn more up in the stem regions. However normally the base pairs are far better of when mixed. Long stretches of either purine or pyrimidine in stems are suspicious - unless the stems are very long or perhaps a coaxial stacking 4 way.
For the switches I see a purine/pyrimidine frequency pattern that breaks the usual pattern. So I say there is a pattern - that's interesting - lets figure out how to use it.
Riboswitches sum up
Pyrimidine C and U bases at active aptamer site
The switching seems to be initiated a lot by particular sequences. Either with CU segments at the active molecule catching site or with GA segments instead. Especially the pyrimidine CU segment, with short and flexible bases seems to be very good at catching a molecule.
This kind of riboswitch typically contain an aptamer loop with 5-6 CU bases in the loop ring and with two GA stretches in stems, close by in either sequence or space.
Purine G and A bases at active aptamer site
But also regularly the aptamer loop bases that are doing the catching are the more rigid and bigger purine GA bases. (Like in the FMN aptamer)
This kind of riboswitch typically contains an aptamer loop with 4-5 GA bases - can be spread on either side of the stem attached to the aptamer loop and two CU elements in stems close by either in sequence or space.
Pyrimidine stretches outside of the aptamer
The MS2/FMN turnoff sequence and variations of it keeps turning up in the switching area close to the aptamer and sometimes even inside it.
Small hairpin loops regularly contains a pyrimidine sequence in the on state, that is likely used for turnoff in the off state. They can typically be seen in a hairpin loop in the on state.
Purines versus pyrimidines - or G and A bases versus C and U
I’m currently learning about alkanes in Khan academy. I have a chemistry book that I try follow alongside.
It mentioned that when alkanes were forming up in rings were less flexible than just the same amount of carbons in a chain. (Page 60 in Fundamentals of Organic Chemistry, 7th edition, International version)
This made me think of that pyrimidines - U and C bases - only have one ring of 6 - though not all carbons, whereas the purines have two rings. One the size of 5 and the other of 6. (For more on pyrimidines and purines check this forum post)
This made me think that the purines must be less flexible than the pyrimidines and as such pyrimidines must have a lower melt point as purines. So I went and checked and really it was so.
Another thought also popped up. I had seen those flexible pyrimidines in action a lot somewhere very specific. In other words, in the switching area of the switches. Not only in the FMN/MS2 labs, but also the solely FMN labs and the TEP labs too. Even in the winner and top scorers of the Theophylline Hammerhead riboswitch lab we did a long time back.
Now the bases, no matter which type, are attached to the same kind of backbone, but I still think how much they fill, has some role to play.
Perhaps something else is in play? I read that when two proteins bind up with each other, they like rearrange themselves such that the hydrogen bonds between them were at similar distance. (How proteins work, Mike Williamson, page 22, figure 1.30) So perhaps this is the explanation of the similar kind of purine bases next to each other, and similar for pyrimidines. Then I would expect to see short U and C’s at the end of long based purine stretches in aptamers and similar A and G’s at the ends of pyrimidine stretches.
Omei: One thing I did want to say as I was reading your recent thoughts, but then couldn't find exactly where you wrote it. But it was about the strand segments of all purines or all pyrimidines. It makes sense to me because of the spacial arrangement of the base pairing geometry when they form a double strand. I'll be very interested to see if these always play the same role in riboswitches, or whether that are more of a multi-purpose tool.
I think it is actually both. Both the purine/pyrimidine strands should make it somehow easier to break a stem, but also that these pyrimidine stretches or the purine ones are good for holding substrates at an equal distance. I added a drawing, the one below. Because I keep seeing these CU and GA stretches not only turn up in the stems around the switching area, but also in the aptamer loops themselves. Not as the sole bits of the aptamers always, but as a strong player. I even think that these players have somewhat fixed roles. They are more likely to turn up some places than others, both in sequence and geometric space.
That does not mean we can't likely stray from the pattern that nature seem to like. I just think the natural and engineered riboswitches have a lot in common. Some of the different types are really just negatives of each other. Either they keep the GA pattern in the aptamer ring. Or they keep the CU in the aptamer. The rest is just a matter of matching up and making sure the complementary stretch is there in a stem nearby to be able to turn on and off the pattern in the aptamer. Plus I think there is a trend for the pyrimidine or purine pattern to be longer in the aptamer and then shorter in the stems, probably to make the aptamer pattern win out.
Aptamers of purines or pyrimidines
Similar I had see this C and U sequence pop up in the riboswitch that I had been reading about lately - namely the glycine riboswitch.
So now I started wondering if this pyrimidine CU pattern or conversely its counterpart purine GA, were in action in other natural occurring riboswitches. And I keep seeing it turn up. Either as CU in the aptamer loop directly or its negative GA in the aptamer loop instead.
The FMN piece inside the MS2 hairpin
A while back Omei sent me an image he had taken of a lab he was working on.
"I was working to create an Exclusion 5 lab switch that relies on a pseudoknot forming in the bound state when I stumbled across a pattern that seems very general. It also, in retrospect, seems very obvious, so I am guessing that we already have some data on it.
The basis for the pattern is that the aptamers for FMN and MS2 share the common sequence of AGGAU. So the general pattern, which seems promising for any lab, would be use the complementary pattern AUCCU in a way that binds with either one or the other aptamers."
Omei asked if I recognized seeing this sequence in other labs.
I did recognize this little fellow. I mean not the exact sequence. But I recognize it for what it does. Its the MS2 turnoff sequence. And there is a number of variations of it. Like GUUCC, GCCUU and the microRNA labs have other similar but less locked variations, since they have no FMN to conform to. The main ingredient however are the CC's, since they are compatible with both the G's in the MS2 and the twin G's in the FMN. The MS2 turnoff in MS2/FMN in general have 2 C's and then some U's. Not always a G. The MS2 turnoff sequence is often very similar inside the same lab. How strong it needs to be partly depend on how far in sequence and space it has to move.
The case Omei had found is a perfect match between MS2 and aptamer and makes it much more obvious what goes on. It is not only the MS2 turnoff. It is the aptamer turnoff as well.
While I knew that with this turnoff sequence it were often the goal to target both the MS2 and FMN, I had not fully thought through that the FMN and MS2 actually carried two identical sequences. The MS2 has a kind of an mini FMN inside it.
This CU pattern runs through like an under stream in many Exclusion labs. The CU aptamer turnoff pattern also runs like a wildfire through the past FMN labs, even its GA partner. (Periodic repeats) Mostly it is important for turning off the MS2. It is often, but not always involved with the FMN also, as the MS2 C's are also regularly used for directly for aptamer turnoff. Although it regularly is used for both MS2 and aptamer turnoff. Also the more zipper complementary solves, target the aptamer gates instead of the aptamer sequence itself. Basically they do the same as the aptamer and the MS2 sharing sequences - they just make the aptamer gate strands complementary to the strands of the first stems in the MS2.
The MS2 has the pyrimidine CU pattern hidden up inside it, whereas the FMN aptamer has its purine negative of GA’s.
Actually the MS2 aptamer is magic. It has both purine GA and pyrimidine CU pattern available inside it.
The FMN aptamer is even more magic. Not only does it goes both ways - it can be involved in either side at each of these end. Sometimes even both at once. :)
As I said to Omei, it was as if the FMN and MS2 were meant for each other. Both because they have sections identical - carrying a part of the same switching element - so they could be made to share a partner (Exclusion), but also because they have parts that could directly interact (Same State). MS2 C’s to FMN G’s. However they do care - a lot - about how they are placed in relation to each other. Too far and too close and they are not effective.
So next question was if these pyrimidine/purine strand patterns, was not just an oddity of our ingame switch labs and a few outside ones.
But why there regularly seems to be additionally many CU and GA stretches in switches, I have no idea. Can it really be that the whole unbound switch is moving in some cases and bigger parts sometimes?
I have earlier been complaining about these specific unmixed base sections in static labs. Bases in stems that are not well mixed, have more of a habit of breaking open and mismatch. Longer stretches of CU’s, GU’s, CA or GA’s are bad. Especially if there are multiple strands of them. Ok, in the beginning I thought that long lines of CU’s were beneficial, because I saw them in longer stems. (I wonder if that kind of sequence has any affect on coaxial stacking or it is mainly loop regions that contribute to that?)
So basically what I’m speculating in is that there are particular sequences that are particularly good at switching, over others. I have earlier been pointing out that the switching seemed to have more repeat bases. Purine and pyrimidine repeats are just a different flavor of repeat.
Loose thought. What if the potential free GA stretches after this imagined turnoff, could start sense the glycine when it is around? Glycine likes to bind to GA stretches anyway. And what if this is enough to make the turned off riboswitch let loose and open the real binding site to the glycine? Could this be an explanation on why so many extra of these seemingly unnecessary GA and CU stretches turn up? I mean, really only the ones in the aptamer are needed.
Theophylline Hammerhead ribozyme and the turnoff sequence
Orange highlights the only sequence we could change.
Winner by AndrewM2A
The highest scoring designs have a high rate of C’s and U’s. And sequences that could look like MS2 and FMN turnoffs.
Notice that the 4’th highest scoring has the exact sequence which is a match for a section in both FMN and in MS2.
What the theophylline riboswitch looks like in Eterna. (Locked bases) It practically has a FMN/MS2 turnoff sequence built in. :) Double U and double C works like a charm to turn off MS2 and when reversing them, one can choose which of the two FMN sequences to target.
Again it have one of its CU segments in the hairpin loop in the on state. Actually this pyrimidine hairpin loop and rather specific sequences of it, is something I see turn up in a lot of switches in the ON state. I generally think it is used for turnoff with a purine stretch in the off state. (I'm not sure why the two sequences are not fully identical)
More unmixed bases. (Orange box) In the past I have been particularly angry at longer GU stretches since they had a habit of splitting our fine single state puzzles and make misfolds happen.
I had been wondering about what it was with that G that kept turning up after, before and sometimes both, for the CU lasso. I think this image shows very well. They works as stabilizers to hold the aptamer when it is having its molecule around.
This is really interesting. The homepage says that the binding of the theophylline makes this small AGG section available for the ribosome and translation. So the dangling section of GA’s are needed for the translation process. The riboswitch is not an island - it is part of a bigger whole. :)
I asked google if ribosomes really did want purine starter sites and found a page by iGEM.
Ha! It really do looks like it.
“Very roughly speaking, ribosome binding sites with purine-rich sequences (A's and G's close to the Shine-Dalgarno sequence will lead to high rates of translation initiation whereas sequences that are very different from the Shine-Dalgarno sequence will lead to low or negligible translation rates.” http://parts.igem.org/Help:Ribosome_Binding_Sites/Mechanism
This is about bacteria. But really thats also where aptamers and riboswitches have been found. So I’m game. :)
It looks like these iGEM guys are doing a lot of thinking about sticking different bioelements together, with intention of being able to build with it. Could be a rather interesting site for us guys. Looks like we are family. :)
I have been doing some color highlighting in the image. A, B and D have the CU sections outside of the aptamer loop area. But the GA purine section in the aptamer. I notice they catch the ligand by making a bow circle around them. Its pretty.
Notice how remarkably similar the hairpin loops are. Actually they carry a variation of FMN and MS2 turnoff sequence - CUUCG - and the G at both ends of the sequence.) Not too sure about the c figure, as it behaves a lot different. But the others I have highlighted what I guess is the switch.
I’m guessing that if anything switches for real in the off state, it will be the aptamer GA’s with the hairpin loop CU’s. Those are the shorter and easier to get moving. The neck of the design is sealed pretty strong off with GC base pairs.
And basically this is what I think these CU and GA stretches are for. Getting the switch move easier. Depending on if GA’s are in the aptamer - then I guess the closest (in sequence if not space) pyrimidine CU stretch are used for riboswitch turnoff. Similar if the pyrimidine CU stretch are in the aptamer loop, then I guess a GA section are used for turnoff of the aptamer when there is no ligand around.
Another thing I find interesting is that natural riboswitches can bind purines. The adenine switch above was one of them.
That lead me to wonder if not such an aptamer would keep pyrimidines in the aptamer ring go catch the purines. And it looks like it.
Actually there are even more of these CU and GU stretches. That's an unusual high frequency for such a small design. Its not something I would have recommended if one wishes ones design to be static. This is why I recommend crossing at least some of the GC pairs in most of the stems. Which this one actually follows. But still an unusual sequence frequency, especially if the design had to be static.
I find it a beautiful detail that pyrimidines are holding an adenine.
Also it is visible that the two hairpin loops are actually kissing each other. This detail also got mentioned in a paper by Rhiju: Link. I have been wondering if static stems had a function, and they surely do. :)
This odd frequency pattern reminded me of a classic eterna lab, The cross lab.
Back then I actually thought this long CU pattern were beneficial, (Blue green strand) however later I changed my mind, to that it was only tolerated in longer stems and that it was bad in anything with short stems that was supposed to be static. Especially if the pattern went between loop and stem - which is exactly what I see it do in these switches with CU’s in the hairpin loop.
The blue green strand pattern ran rampant in this the cross lab. And I’m suspecting that it is somehow beneficial when coaxial stacking is going on. But I have no idea. I’m guessing that it prevent nearby neighbor stems of pairing up unwanted. Usually two complementary strands wishes to pair with each other and especially if they are close in sequence. Not too close - for a loop to form, but not too far either. If two fitting strands are neighbours with enough space bases to gain a hairpin loop, they are as good as a match. And this anti parallel pattern between neighbouring strands ensure that a strand doesn’t go for the neighbor before when it is meant for the neighbor after. At least that is my guess. So perhaps something similar is in play in this adenine switch as it looks like it is fully capable of lassoing the adenine for a bind, by getting two stems coaxial = energy bonus and two hairpin loops kissing - guessing that's an energy bonus too.
Further in my riboswitch search, I even found this thing called a G box.
Ah, I think this is what is called a G box = guanine binding. Ha, I think I rediscovered the g box riboswitches. :) So there are a whole category of G box riboswitches - having CU stretches in the aptamer.
If this is the case, then a C box category is needed too :) Since I have seen other riboswitches use purines for the aptamer catches. Like the FMN aptamer.
MicroRNA versus pyrimidine dangle
I think the pyrimidines/purines switch thing are also a reason why these overwhelming pyrimidine dangles are productive in microRNA labs. They are flexible too to sniff out their partner microRNA.
And our microRNA catching designs, likes to have dangling tails of C’s and U’s. Plus a combo of 4+ pyrimidines - (in mixture) seems to raise the yield in cluster counts. They are often present in Exclusion labs designs with +100 clusters. Unfortunately it also seems to hurt the KDoff.
cooperativity influence on cluster formation:
there seems to be a slight though significant dependence of the cooperativity on the cluster formation. In simple terms, the higher the cooperativity, the more clusters are created. While this holds true as a general trend, the bulk of designs scores up to subscore 22, up to which point the general trend holds true as well. As for all subscores higher than 30, it appears the actual values all fall short of the predicted trend. Then again, only five of us managed to get such high cooperativity points, so it is difficult to apply any real data analysis. On the other hand, 100% of these five values are represented by data at >10 clusters, so they are valid designs.
Cooperativity, Round 3
Small sum up of what trends I see now.
Cooperative Binding - multi MS2 (shorter)
Not happy about MS2 gates - with stems before the MS2’s
Seems to dislike great distance between MS2’s - although they can also be placed too close (few bases distance)
Not too happy about static stem between the MS2’s.
Not too happy about a neck of the tails either.
Not particular fond of GU’s
I have been attempting to sticking in both static stems and necks in these labs. They haven’t been particularly willing so far.
But JR was almost getting away with both a neck and a static stem in the switching area. And it has a cluster count of 95.
Cooperative Binding - multi MS2
Carries many of the trends of the shorter lab.
Not happy about MS2 gates - with stems before the MS2’s - but might be more tolerant
Seem to like static stem outside of the switching
Seem to have a wider range for distance between MS2’s
Slightly opposed to necks and static stems between MS2’s, but have a bigger tolerance to it compared to the shorter lab.
Still not too fond of GU’s
But as Johan mentioned in his analysis, this lab longer lab had a far lower average cluster count and that a good cluster count seemed to be related with good data quality, so I have probably already said way too much about this one.
One thing I do think I can say something about is the GU content or rather it not being strongly present. I think the reason why these designs labs don’t need much GU’s, are that the loops that form between the MS2’s already works as a more effective splitter than GU’s. The bigger the loop, the lower the resulting kcal between the pairing MS2 sequences - or whatever sequence one has pairing with each other. Thus the loops are already helping with the splitting.