We are very excited about the results which have been posted in Eterna:
and the data can be found in
Google spreadsheets: FMN/MS2 Riboswitch Structure
or the Excel file: FMN/MS2 Riboswitch Structure
I have been thinking about if we could benefit from Salish's end bits elsewhere than the neck. In particular in relation to the FMN/MS2 Riboswitch Structure: the Paper lab.
In some of these labs the designs are rotated, so the static stem that is attached to their aptamer is not a neck, but a hairpin stem.
I am wondering if if even such a "pseudo neck" could be benefiting from "end bits" or rather mismatches to give a little relaxing effect - even though it can't be flexible in the same way as a neck.
Now I would like to have a Riboswitch on a chip design with a pseudo neck in it, to test this in. But since I don't, I took the second best option and used my high scorer in the R2 lab, which has a static stem in the switching area.
I removed the tetraloop G boost as I don't want it disturbing the experiment - as it will end up as boost base to some of the new "end bits". I'm aware that this may result in a different score compared to the original design. So I run a design with that change alone. Then I have added an U base somewhere in the loop to avoid more than 4’s in a row.
Also I didn't want to add the end bits themselves in the tetraloop as tetraloops whatever their loop boosts and mismatches, could be far too stable for the effect to show. And likely rather have a stabilizing = stiffening effect. So I have removed the G boost (base 37) and later adding in a base in the loop to avoid more than 4 A's in a row.
Experiment name: #Pseudo necks with end bits
Could you guys think of any similar experiments that could be worth doing?
There are two patterns that stands out in a good deal of the designs that receive a high global fold change - the bigger the better as Johan says. Designs that do great at switching not just at the lab fulfilling concentrations but also at additional concentrations tested beyond that.
These designs types also show the most basic way to reduce amount of magnet segments.
R3, score 100%, fold change 16.44, global fold change 12.71
Global fold change range up till 14% for now
- Lane sharing by sequence overlap between switch elements.
Reducing the area of the design complex
Reducing the amount of magnet segments
Sequence sharing seems the strongest strategy for now. I think the reason why this sequence sharing work so well - is by making the sequences actively share - the inputs are pretty much made dependent on each other. If concentration affect the one switch element - it is bound to affect the other too - and opposite. Since they share.
R3 designs like jandersonlee’s winners that have input complements share lane with MS2. Here MS2 and TB B turn each other off by sequence sharing. When the one is gone the other will be gone too. When TB A is on TB B will be gone as they are sharing lanes and TB A will be gone when TB B is there, as for TB B is using some of the bases that TB A would otherwise have needed.
At the same time, the sequence and lane sharing results in an overall reduced area of the RNA design complex. The smaller the actual switching area, the easier the switch goes.
Static stems in the switching area
R2, score 92%, fold change 8.75, global fold change 6.78
Global fold change range up till 7%
Reducing the amount of magnet segments
I have high hopes of this strategy being particularly beneficial for designs with more than 1 input. As it is a way to strongly simplify the design - and a simpler design is easier getting switching. The fewer switching base pairs, the easier the switch.
That said, the sequence sharing and overlap strategy seems to be the overall best. Then introducing static stems in the switching area if possible and should that fail, I think using the full RNA sequence with the RNA inputs spread more out, is next step to go.
So even while the designs with a static stem in the switching area are being partially locked, they till now have achieved a higher global fold change than full moving switches without sequence sharing. Those have reached a 5% global fold change for now. I look forward to see if that trend continues.
Related post: Christmas snake
More on this complex placement discussion - which is related to switch design simplification.
What has Gibbs energy got to do with it?
I have been puzzled ever since we got data back from from the two rounds with RNA in and RNA out. The winners in the two labs with miRNA-in, reporter-out showed totally different trends when it came to positioning of the design complex.
Here is how I define “design complex”. Any RNA design that do not take up the full length of the RNA sequence. In other words a design that leaves space at one end for tying up a dangling end into a static stem.
The reporters and the input were rather short compared to the full RNA sequence. But each lab had the design complex placed at either end of the RNA sequence. I had expected them to land at 3’ end as in the microRNA 208a lab. However two input labs behaves differently on this account.
Lately I have been thinking that the 5’ end is the best for whatever design complex needing a good spot on the RNA sequence. Reporter-out round 2, R3, XNOR have their design complexes show up at the 5’ end. But that doesn’t explain the miRNA-in Reporter-out round 1 lab that has its winners design complex show up at the 3’ end.
So in the latest A/B lab round, I have deliberately been moving well working design complex to the opposite end of the RNA sequence, plus did the same to some not so well working design complexes, that I suspected would benefit from the moving.
I noticed while moving some of these designs that the overall kcal either got more negative or less negative. The ones that were good that I moved to opposite end, got less negative. The ones that did not do well, and that I moved to the other end, got more negative. I started to wonder if there was a relation.
Here is a sum up I have written with my thoughts on the placement of the design complex in the RNA sequence.
Kcal and legos
I also recalled how Ksteppe tried to explain to Janelle and me a long time back, that RNA really wanted to have the most negative kcal. Something that didn’t made good sense, as we have both been seeing designs with very negative kcal - as according to simulation - fail miserably in lab. However the point is exactly that it was a simulation kcal. In nature things tend to go towards the more stable state. In switches we are walking a fine balance, as each state will need to have some stability to it, while still allowing the other to be possible also.
As Janelle kept insisting on understanding this odd free energy concept and Ksteppe found an awesome way of explaining free energy, using just legos.
Omei’s reporter labs and design complex end position
I looked at the reporter lab results again and in both miRNA-in and reporter-out labs I moved some of the top scoring designs to opposite ends. And the winners in general got less negative with the moving. Except a few cases with Omei’s designs in the round 1 miRNA-in reporter-out lab - that may actually benefit from an opposite end placement.
I finally think I may have found out why the winners in these two reporter labs have their design complex end at each their end of the design. Each design complex strives to get the most negative possible kcal. And the end that provides it, is the end the design complex will choose.
I afterwards realized that Omei has already done an experiment that actually goes to demonstrate. He made two series with identical design complex and slided the one series to the opposite end of the other.
The designs that are in the winning series generally are more negative. And tend to have their design complex towards the 3’ end.
I searched with that switching sequence that Omei mentioned he had used for his experiments and got a few more of his series pop up in the search as well. Not all these designs have the design complex placed at either end, but all contain the same sequence for the design complex.
Notice how the score seems to go down in relation to the energy becoming less negative.
I can get equal or more negative energy if I slide a few of the designs. If I slide the design below to opposite end, it gets an equal energy, so this one may benefit from reversal. Case with even energy no matter which end - in simulation...
Original, Omei TB in/out 1 v3, score 100%
The V2 design in the same series actually gets more negative kcal when slided opposite in second state, but not first.
This miRNA-in reporter-out round 1 lab also show more ambivalence when it comes to scores and fold change. Those are quite fine, but are weak in comparison with the miRNA-in reporter-out round 2 lab, that with 5' end position does far better. But again these two labs also have different length reporter, so I can't fully say which is causing what.
So perhaps a few of these round 1 winners would also be happy at the 5’ end. But my point is that negative energy as we have it for the 1 state in the data overview may actually help to say something about which end to place a design complex at.
Sum up on design complex placement
So my summed up advice for bots and humans in case they are having a design that is not taking up the full RNA sequence, is to place the design complex at whatever end, that gives the most grumpy kcal for 1 state. Make that Gibbs energy go negative!
That be either if the design complex is due to use of lane sharing, static stems in the switching area or just plain short inputs, as in the reporter labs.
Just wanted to add that PWKR's double aptamer design, is not just a double aptamer, it also seems to be a cooperative switch.
The best of the designs with double aptamers receive almost triple the amount of fold change, compared to the single aptamer design with best fold change
Best double aptamer design - fold change of 155
Best single aptamer design - fold change of 52
Johan made a sweet graphics overview of single aptamers versus double aptamers versus double aptamers with mutations in one of the aptamers.
Some of the double aptamer designs with mutations in one of the aptamers, still were able to get double fold change of those of a single aptamer design.
I have identified some signatures that are specific to either switches or static designs. Each have their specific base and base repeat frequency fingerprint.
Finger print of a switch, high amount of purine (GA) and pyrimidine (CU) repeats
Frequency differences between switches and static designs
Here are some characteristics that can be used to separate static designs from switches. On suggestion by Mat I have split the pointers so there are one set for primary sequence and one set for secondary structure.
I have used 2 switch and two static designs for illustrations of how base frequency differs between the two types of designs. The static designs have more in common with each other, than switches and vice versa. These frequency differences seems general for switch and static lab designs based on what I have seen so far.
The static designs have more in common with each other, than switches and vice versa. These frequency differences seems general for switch and static lab designs based on what I have seen so far.
Frequency spreadsheet with numbers of repeats and bases, for the example designs.
Switches designs have a high ratio of pyrimidine repeats to purine repeats. Almost 1.16 - 1.23.
Pyrimidine repeats high (33-38%)
Purine repeats high (29-31%)
Percentage of A and G bases, close to each other
Percentage of C and U bases, close to each other
- Static designs have a higher amount of repeat A’s and A bases in general, compared to switches
Static designs have a high amount of purine repeats (64-78%)
Static designs have a low ratio of pyrimidine to purine repeats. Less than 0.2
Rules for primary sequence
Characteristics of static design
Base: A repeats high
Sequence: Purine repeats high
Sequence relations: Ratio of purine to pyrimidine repeats low (0.15-0.16)
Example of A repeats (Almost all in loop area)
Example of Purine repeats (Most in loop area)
Example of low pyrimidine (green) to purine ratio (red)
Characteristics of switch designs
Sequence: Pyrimidine repeats high
Sequence: Purine repeats high
Sequence relation: Ratio of purine to pyrimidine repeats high (0.80-0.85)
(Repeat bases seems to make weaker binds - meaning they will also easier switch.)
Example of high pyrimidine (green) to purine ratio (red)
Rules for secondary structure
Characteristics of static design
Base pairs: Well mixed bases
Repeat bases in loops - a purine repeat stretch is just a word for loop or single base area
Purine repeats in loops
Visual: All loops, stems, internal loops, multiloops of different size
Energy modeling: Even energy distribution
Base pairing: Varied bases in stems (avoids slides and unwanted complementarity)
Example of well mixed stem bases
Example of fairly even energy distributions in stems of similar length
From the forum post Energy, structure and symmetric colors
Characteristics of switch designs
Repeat bases - in stems and switch elements (eg aptamer/MS2)
Pyrimidine repeats in stems and switch elements (eg aptamer/MS2)
Purine repeats in stems and switch elements (eg aptamer/MS2)
Visual: Similar size elements
Energy modeling: Uneven energy distribution
Base pairing: Complementary pairing between stretches of the repeat bases (purine pairs with pyrimidine
Example of identical size elements - two repeat FMN’s
Example of some symmetry
Complementarity between repeat sequences
Beneficial repeats in static stems
Furthermore, the pyrimidine repeats and purine repeats also seems to be beneficial for switches, even when it is static stem area. I have been seeing it and scratching my head. It pops up in our switches and switches found in papers.
Earlier example from this post.
There is even a fine pattern for single bases C and U to be more close in percentage to each other than G and A. Plus G and A being more close in percentage to each other than C and U. A logic consequence of raised amount of purine base pairing with pyrimidine in switches.
Slightly different ratio of purine and pyrimidine in ON/OFF labs?
I took the top scoring design in the 8 Logic Gate labs, (max 1.25 fold change error). In 6 of 8 designs, pyrimidine repeats had a higher ratio than purine repeats. Except for A OR NOT B (TTFT) + OR (FTTT), where purine repeat ratio dominated over pyrimidine. They are both ON switches.
There may be a tendency for ON switch needing a little higher ratio of purine compared to pyrimidine, compared to OFF switches.
This fits the pattern of ON switches (INC) labs doing better with the TB inputs heavy in pyrimidine - because this will raise the potential for purine repeats in the design. And OFF switches (DEC) doing better with the TB inputs heavy in purine - because this will raise the potential for purine repeats in the design. (OFF switches wants C magnet landing sites, ON switches wants G magnet landing sites)
Purine/pyrimidine content - a good predictor for a good input for an ON or OFF switch?
Even more the switch signature with higher ratio of repeat pyrimidine compared to repeat purine, may actually go to explain why Round 1/3 OpenTB inputs were less good than Round 2 OpenTB inputs. (LINK to background post)
The first input set (OpenTB Round 1/3) have a higher ratio of pyrimidine in them, hence sparking a high purine content in the resulting designs. Whereas the second set (OpenTB round 2/4) holds a high purine content in them, hence sparking a higher pyrimidine content in the designs. The latter being favorable to make working switches. Working OFF switches in particular.
The high pyrimidine content of our early cloud lab switches were not a coincidence. (Periodic repeats in RNA switches - How can they be programmed?)
Perspective - What can this be used for?
- First of all making better designs. :) Knowing an optimal percentage of different bases, purine repeats, pyrimidine repeats and the ratio of pyrimidine repeats to purine repeats, will considerably up your chances of making a working switch.
- Choosing better inputs for ON and OFF switches
- Data could be pulled pulled from a larger set of designs from a round of switches and static designs, to get the accurate percentage of bases and repeats that are normal frequency for these design types.
Based on these pointers, it should be possible just based on a sequence, to say something about if a sequence is a likely stable design or it may be a switch.
- These destinct signatures for switches and static designs could be
finetuned by finding them for whole rounds in spreadsheets for the
- Eventually this could get used for what Rhiju would love to see. Digging natural RNA switches out of the sequence soup of RNA data bases or raw sequence outputs from nature.
The structural language of RNA
I will tell a story similar I told in the post above, with some of the same images, but I will tell it in a different way.
What I wish to illustrate, is that just the frequency of bases used for a small stretch, already spills quite a lot about what kind of structure it will make. Albeit what it will end up forming, also will depend on the base frequency of the whole sequence, but also by complementarity from other subsequences.
How to spell the word LOOP in a RNA sentence
When I say use a stretch of mainly A bases (but not purely A for 6+ base stretches) this is af if I’m using a word that says loop.
These A stretches are really just words for something that will be loop, be single base like multiloop ring or gap bases.
GAAAUAA says internal loop or hairpin loop
AAUAA says internal loop, or gap bases or multiloop bases
That is if the overall sentence is to spell static RNA design.
How to spell the word Stem in a RNA sentence
Similar, if I use a small stretch of well mixed bases - a mix of all 4 bases - A,U,C and G - then I'm really saying find a partner and make stem or fold up with yourself, if there is internal complementarity.
Stem sequence in static designs have a different base frequency to loops. Loops are high in A base frequency, stems are not. The stem base frequency will change slightly depending of if this stem is long or short. Shorter stems will tend to be higher in GC content, longer stems will tend to be higher in AU content plus hold more GU also. Similar longer stems can tolerate more repeat bases, than shorter stems.
The surroundings change the meaning of a word
So to get to a specific structure, you will need to put together the right words. Plus make sure their surroundings does not make them mean something entirely else than what you intended. Nice doesn't mean nice if it is placed with a couple of other words like this: Not so nice.
Stems in switches and static designs have a different base frequency of their base content. Switches harboring much more pyrimidine and purine stretches - in other words less well mixed base pairs.
What will determine if a sequence spells static design or switch, is its general base frequency, its amount and type of repeat bases plus their ratio in relation to each other.
Stretches of purine in static designs also tends to mean loop, or loop plus a G base from a stem. If the overall sequence is to spell switch design, the exact same subsegment with a stretch of purine bases, can mean build switch element - be it part of an switching loop or a switching stem.
If the sequence has a high amount of purine repeats, but a low amount of pyrimidine repeats, then there is a likelihood of a static design. (Above)
If the sequence has a high amount of purine repeats, but also a high amount of pyrimidine repeats, then the likelihood of a switch design is high. (Above)
What structure a specific subsequence will become, will depend of if there are complementarity inside itself or if there is another "make stem region" that is complementary. So just short stretches of bases already have pre built orders (demands) into them.
GCAUC - spells a strand of a potential stem 5 bases long. Especially if there is GAUGC elsewhere
GCAUCAUAAGAUGC says hairpin stem
What structure will form, is all a matter of the frequency of the bases that are put together. Plus how many repeats there are. Plus if there are potential complementarity
To make stems, specific complementarity is needed. To make multiple stems, the stems should preferable all be different both in internal solving sequence, but also in size. Else the puzzle is not going to be stable.
On the other hand, for switches having several stems solved exactly the same way and in the same length, is a way to secure a controlled switch, between alternative conformations.
Designing for a specific structure from scratch
Words that have the correct frequency of bases in them to really spell loop when loop is needed or really spell stem, when stem is needed. Plus each time you add a new stem, you need to have its word not match up with a word for another stem.
I think this with focus not just on making complementarity between elements (satisfying the ingame energy models) but also the focus on correct base frequency to fit the desired puzzle type (switch/static), and using the correct repeat base ratio, can actually be used to increase the likelihood of building a wanted structure from the get go. Plus last but not least, rule out a ton of unwanted sequences.
As I love to say, RNA is really only a game of frequencies. Use the right ones!
Thx to Mat for discussion
Switches love repeat bases and repeat sequence in particular. But not all kind of repeat bases are equally welcome in a switch.
I will use a couple of the designs that I shared in the posts above, for illustration.
Static design with lots of repeat A's.
NB: The A repeats with weak yellow are outside the design itself. Belonging to the design scaffold. But still there are double the amount of A repeats and they are longer.
Switch design with low amount of repeat A's.
NB: Here there is even an extra FMN aptamer, raising the amount of A repeats by 1.
A script to determine purine and pyrimidine content of a sequence
I have made a script (Purine and pyrimidine repeats finder 1.04) that can count the purine and pyrimidine repeats in percentage and print them.
Here is a demonstration of the script run on the above designs.
Sequence of static design:
Sequence of switch design: AUGGUUGUCGUAGGAUAUGUAGGAUAUUCCUACAUGAGGAUCACCCAUGUUCGCAUGGAAGAAGGACAGAAGGACGAUAACCAC
Running the same script with GU and CA instead:
The more repeat A's, the less of a switch
If I run the script searching for A repeats instead, it looks this way:
Other repeat base markers of switches
I have been scolding longer stretches of GU's in switches. And I still think they are disadvantageous if the stretch of GU's is really.
However when I ran my base count script but changed the bases to GU and CA repeats instead, they just as purine and pyrimidine repeats, got a similar ratio to those of pyrimidine to purine repeats. Roughly something like 70 %, which is skewed to random distribution. If it had been more random, I would have expected a 50% ratio. Another rough guess for switch base ratios are 40% GA, 30% CU, 40% CA and 30% GU.
Again this is a base repeat ratio that differs between switches and static designs. If I run the two designs through a modified version of the script, there is a clear ratio different between GU and CA in switches and static designs:
Static design, a ratio of CA to GU of around 3.
Switch design, a ratio of CA to GU of around 1
The longer the base and sequence repeats gets, the more different they get. Similar they can't be all same length.
Potential explanation for switch repeat sequence?
I have been wondering about what if there were another mechanism involved in switching too. Because CA and GU repeats are not of the same kind as purine and pyrimidine. They are more like the tooths of a zipper. I think that combined with repeat bases may make for weakness and a potential switch.
Sum up: Where do switches and static designs differ?
- Switches holds way less A repeats than static designs + they are shorter
- Switches holds more C repeats than static designs
- The more and longer A repeats in switch, the lower a switch is likely to score
- When switch designs gets stretches of 3+ repeat A's or above it starts hurting their scores. Similar if there are a lot of shorter repeat A bases. Giving a quick guess I will recommend a limit of max 10% A repeat bases.
Here I search the AC/INC lab from OpenTB round 2 by amount of repeat A's. The more A's, the fewer winners.
Repeat A's in full rounds
A similar pattern goes for the whole round of the 101 riboswitch lab:
So far I can recall seeing just one winning switch design with 7 A bases. As far as I recall it was one of Omei's.
It was before repeat A's got limited to max 4 in a row.