Switch Scores for EteRNA Switch Puzzles

  • 11
  • Article
  • Updated 8 months ago
An exciting direction in EteRNA is the study of riboswitches!

We have recently finished our pilot experiments with great initial success. Using a new technique that measures switching directly on a sequencing chip we directly observe the switching for thousands of designs at once. The signal is generated by a fluorescent RNA binding protein, MS2, and instead of the standard EteRNA score, which is based on the correct folding of each base, we have introduced a new Switch Score.

The Switch Score (0 - 100) has three components:
1) The Switch Subscore (0 - 40)
2) The Baseline Subscore (0 - 30)
3) The Folding Subscore (0 - 30)

The scoring scheme is summarized below. A more detailed description is given in this PDF:
https://drive.google.com/open?id=0B_N0OA9NROPGel80SG5LM0wtZms&authuser=0

A typical example of a switch puzzle is shown below:


The player designs the structures in [1*] and [2]. To observe the switching we then measure the fluorescent signal of MS2, which binds specifically to the MS2 hairpin seen in [2]. In the absence of FMN, the MS2 should bind and the switch is ON. On the other hand, if we introduce FMN, the ligand in [1*], the switch should be OFF and not exhibit fluorescence.

No switch is 100% ON or OFF in the absence or presence of ligand, but a good switch can come very close (and get a perfect EteRNA Switch Score!). A some MS2 concentration, the difference should be large (e.g., at ~100 nM MS2 in figure below). In practice, we don't know this concentration beforehand so instead we perform measurements at many concentrations to obtain binding curves. When the switch turns OFF (red curve), the effective dissociation constant increases. The dissociation constant, Kd, is the concentration where half of the RNA binds MS2.


The Switch Subscore quantifies how far apart the Kd's are in the absence and presence of FMN (horizontal distance between the red and blue curves).

The Baseline Subscore is a measure of how close the ON-state is to the the original MS2 hairpin (lower Kd is better, i.e., blue curve should be far to the left).

The Folding Subscore is high if MS2 bind properly in the ON-state at any concentration (the score should be high for the blue curve at high concentrations of MS2, i.e., high values to the right)

In our first experiments, we found that the easiest score to maximize is the Folding Subscore, followed by the Baseline Subscore. These two ensure that the MS2 hairpin is properly formed in the ON-state. The hard one is the Switch Subscore, which is the highest when the energy difference between the states is finely-tuned to the energy conferred by binding to FMN (or other future ligands).
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes

Posted 5 years ago

  • 11
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
And, finally, the score to subscore comparison:

Switch SubS:


Switch SubS by experiment:


Baseline SubS


Baseline SubS by experiment:


Folding SubS:


Folding SubS by experiment:


Questions for me:
1) If the folding subscore is below 100%achievable, should the molecule automatically be discarded (I don't think so)? - we do't get the structure of the actual fold as a result in the MS 2 experiments.
2) The highest switch Subscore was a chieved in design 4815498, yet the overall score is not the highest achievable. So, switching is important, but not of the utmost importance - why?
3) There is a set of designs (e.g. 4832785, 4835726) that have 100% base score and a little extra on the switch score, and four designs (4830304, 4832773, 4833596, 4833800) with 100% basescore (30) and 0 score anywhere else. This makes for an interesting data set, as these are clustered the highest in baseline, while being significantly separated from the rest of the pack. Should such designs be avoided?
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Beautiful plots!

Here are my answers for your questions:

1. The folding score is the easiest to max out, and all successful designs should have a maximum score (30). Designs with low folding scores are not great and should probably be discarded.

2. It turns out that even for an ideal switch, there is a trade-off between switching magnitude and the baseline score. The higher the kd in the ON-state, the higher the theoretical fold-increase in kd. That is why we included the baseline score since otherwise you'd get a better switching magnitude by simply increasing the kd in the ON-state (i.e., make it bind MS2 less well)

3. The designs you mention should be avoided! If the folding score is low, it means that the switches never bind MS2. Because of this, the low signal can lead to erroneous fit values that don't mean much in practice. For the switches you mention, the apparent Kd is actually much lower than for the MS2 hairpin itself, which doesn't make sense. We reported all the data without flagging these specifically, but you made an astute observation regarding the usefulness of these.

I'd like to think about the three subscores as three hurdles one must pass.
1. If the folding score is less than 30, it's not a good design.
2. If the baseline score is less than 30, it does not bind MS2 very well in the ON-state and is therefore less interesting.
3. The switching subscore represent the part of the challenge where interesting things happen and where great designs are separated from the rest.

Going forward, we should definitely try to pass hurdles 1 and 2, i.e., aim for maximum folding and baseline scores. Then we'll do the best we can for the switching. For the first round, I was very impressed by the different solutions. With the great analysis above I hope that we can push it even further next time around.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Need to find a way to post the original xlsx and pdf files...
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Ok, so I finally got to the ratios:

First, impact on the Eterna score:
GU/AU:


GU/GC:


and
GC/AU:


As the ESc includes all subscores, it's difficult to make a detailed analysis on the exact reason for the data distribution, so looking closer at the subscores now.
First, baseline and switch subscores, and to take it in advance, these don't tell us anything:

BlSc:


FSc:


SOOOOO, now we take a close look at the switch subscore, arguably the more important part for this particular lab:
GC/AU (note that I chose the log scale to strech and thus emphasize the outliers a bit):


GU/GC:


GU/AU:


These give us a direct measure of favorable basepair ratios, as printed in the graphs, i.e.:
GC/AU ratio:
Good results: 0.4<0.4
Best results: GU/AU = 0
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
some text is missing, can't use the smaller than signs (interpreted as html)
These give us a direct measure of favorable basepair ratios, as printed in the graphs, i.e.:
GC/AU ratio: Good results: 0.4 smaller than GC/AU smaller than 1.75, Best results: GC/AU = 1
GU/GC ratio: Good results: GU/GC smaller than 0.33 Best results: GU/GC = 0
GU/AU ratio: Good results: GU/AU smaller than 0.4 Best results: GU/AU = 0
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
I love what you have done here. I think this is very useful for future designing and for estimating what will be a good design.

I would love for EternaBot to have a "talk" with you and get incorporated these base pair ratio checks. I think what analysis we have made in here, should be very usable for proposing of MS2 strategies.

I just recently learned that our eternabot has been revived. :)

Chatoutline from the other day:

Hyphema: i see eternabot is making designs. did you decide to bring it up from the dead or has it been used all the while?
Nando: it has nothing to do with me
Nando: there's a rotation student at the Das lab at the moment, she's been working on reusing the eternabot code to produce these candidates

I'm wondering how base would do against base.

I see C’s will be less numerous than G’s. I think there will be a relationship between some of the bases, that we can use to call out bad designs. Eg. I would expect designs that have more C's than G's to do less well, than those who have a little less C's than G's.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Yes, let's put these into eternabot, maybe specific to MS2 labs.
Photo of Meechl

Meechl

  • 81 Posts
  • 27 Reply Likes
I think it's interesting to compare the basepair% and the score, but I also wonder how useful it is to do so, because we don't really know if those are the same pairs the RNA actually fold into.

But, if we do want to look at the basepair composition, I think it'd be cool to compare the first state to the second state. However, that'd require some work to get the basepair% for the second state. Anyone know of an easy way either to get that or to get the structure of the second state? Maybe by modifying this script? http://nando.eternadev.org/web/script...
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Meechl, I like your idea with comparing base pair composition across states.

You are right that we can't know what base pair composition there is. As what we can know is what the energy model tells us.

However we can know the base composition, so if base against base is run and versus score, I think this can give us something similar to base composition. It won't tell us the exact amount of basepairs there are, but it will tell us which ratios of, eg C's versus G's will be most likely for the best designs.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
Meechl, I created a new version of Nando's script that adds the two structure strings. But I haven't done any testing to verify they matched what one sees in the UI. Caveat emptor.
Photo of wuami

wuami, Researcher

  • 22 Posts
  • 6 Reply Likes
Hi all - I am the aforementioned rotation student currently working on eternabot for switch puzzles. These analyses are really great, and I would love to hear your ideas about how to implement some kind of scoring based on base/base pair composition.
Photo of Meechl

Meechl

  • 81 Posts
  • 27 Reply Likes
Thanks, Omei! I checked a random design and it was the correct structure, so I'm going to assume it works perfectly. :) I'm hoping to have time today to use this to calculate the basepair% for the designs in the second state.
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Hi Wuami!

This is sweet news!

Big thx for reviving our EternaBot. :)

Ok, I will propose a few strategies, based on the graphs and spreadsheets by Meechl, Omei and Salish. Plus some loose estimates of mine. Each small strategy on its own won’t make the big difference. A lot of bad designs will still not be ruled out, since they can also share characteristics with the good designs, but together these strategies can help EternaBot target its way closer to recognizing good designs from bad.

I basically think we can make EternaBot perform better on the MS2 switch designs, if it gets a pointer to look for a few specific things, based on the top scorers from round 1 and the general data trends so far.

We usually when making strategies for EternaBot, say what we think is Good, Okay and Bad. And at dev end, the strategy get adjusted to run at its best. So please don’t take the values as precise estimates, just as pointers for what we think is a good range. What I give you are rough outline of strategies.

We usually write our strategies with plain words and have no idea how to talk with the bot. Thank you for your interest and for talking with the bot for us.

STRATEGY PART

Good base pair ratios

Estimate of numbers roughly taken from Salish’s graphs

GU ratios

More than 13% GU’s ---> Bad
13% to 10% ---> Less bad
7% to 9% ---> Somewhat okay
0-6% ---> Better, but GU’s not really needed (according to how the energy model folds the RNA. Salish was sceptical about this too. We have no idea of knowing what really happens.)

GC ratio

0-9% ---> Bad
10-20 % ---> Okay
21-27% GC ---> Good (Best range taken from Salish’s GC% vs ESc graph)
28-40% ---> Okay
41 and above ---> Bad

AU ratio

0-4% ---> Bad
5-14% ---> Okay
15-21% ---> Good (Best range taken from Salish’s AU% vs ESc graph)
22-33% --->Okay
34% and above ---> Bad

Base pair vs base pair ratios according to Salish

I basically think that the test Salish did with GC ratio versus AU ratio was smart, as it gives an idea of what ratio is good for potential possible base pairs. Even if we can’t know anything about how things actually paired.

So I will suggest that you penalize/reward according to the base pair ratios that Salish gave:

GC/AU ratio: Good results: 0.4 smaller than GC/AU smaller than 1.75, Best results: GC/AU = 1
GU/GC ratio: Good results: GU/GC smaller than 0.33 Best results: GU/GC = 0
GU/AU ratio: Good results: GU/AU smaller than 0.4 Best results: GU/AU = 0

But since we can’t say anything about actual base pairs in bound and unbound state, there needs to be a layer more added on. I think the absolute strongest means for getting an idea about if a design is in a good range, is to compare base versus base ratios, sorted by score. Plus reward designs for having good base ranges.

Base to base ratio

I think the following base/base comparisons could be interesting.

Compare the % ratio of C’s compared to G’s

If numbers of C’s is half than or 2/3 of the number of G’s ---> Really good (Rough estimate based on Meechl’s MS2 spreadsheet)
If numbers of C’s less than G’s ---> Good
If numbers of C’s = G’s ---> Less good
If numbers of C’s bigger than G’s ---> Bad

Compare ratio of C's to U’s

U’s seem to follow the range of C’s to a certain degree.

If U’s in double amount to C’s ---> Okay
If U’s are some more than C’s ---> Good
If U’s similar amount to C’s ---> Okay
If U’s are a lot less than C’s ---> Bad

Compare the ratio of A’s to U’s

U’s ratio compared to A’s can vary wildly. Like there be like 5-6 times more A’s than U’s. So it might not be the best strategy pinpointer. However this usually holds.

If numbers of U’s less than A’s ---> Good
If numbers of U’s = A’s ---> Less good
If numbers of U’s bigger than A’s ---> Bad

I think there will be more base/base ratio comparisons, that could give a good pointer to things that may either help or cripple a MS2 switch.

Good base range

Ask for min and max base % for being in a good range. What are good A, U, G or C base % in good designs?

Max and min % G's Good range 15-25% (According to Salish’s graph G% versus ESc)

Max and min % C's - Good range 10-14% (According to Salish’s graph C% versus ESc)

Max and min % U's - Have a huge spread - so will a bad pinpointer. So better penalize the clearly illegal ranges.

Max and min % A's - Have a huge spread - so will a bad pinpointer. So better penalize the clearly illegal ranges.

This can be made into a strategy, like demonstrated with the first strategy Good base pair ratios, just by watching Salish’s graphs on base percentages and saying % range what will be good, bad and so on.

Reward designs for being in a good range for each base. To either side, start penalizing when a design gets either below or above a good base range. Penalizing increasingly the further outside of a good base range the design gets.

I think a rather narrow range C especially and G’s to some degree too, are much more important for a designs well being, than having a very specific amount of U’s and in particular A’s.

So I suggest penalizing harder for C’s getting out of optimal range, than G’s getting out of optimal range, than U’s getting out of optimal range, than A’s getting out of optimal range.

Salish, Meechl and Omei, feel free to correct and expand anything I say here.

Help us make good bot strategies

Everybody, please feel free to add your ideas. What do you think will make for a good MS2 switch? If we can point out a characteristic of good designs in the MS2 labs, then EternaBot have a chance to look for it and become better.

Wuami can help the bot see the pointers and the bot can then do machine learning.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Omei, your link doesn't seem to be working (only http://, no text thereafter)
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Maybe we can put in some delimiters, even on the U's and A's:

C% must be < 20% and should be 10-14%
G% must be < 30% and should be 15-25%
U% must be < 20%
A% must be 30-70% (well, this leaves quite some margins) and should be 40-60%
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Salish, great. This looks good for a strategy. Perhaps add some minimums also. But it should do.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Apparently, my files are too complicated for FusionTables.
I have uploaded them here:
http://eternawiki.org/wiki/index.php5... (old analysis word file)
and here, incl all the graphs posted above:
http://eternawiki.org/wiki/index.php5...
and
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
Yes, fusion tables are limited in the data types they can hold. For our purposes, basically only text and numbers.

The switch graphs are not actually in the table; only their URLs are.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Well, all graphs above can be found here (I also color coded the experiments, so they can be distinguished more easily
http://eternawiki.org/wiki/index.php5...
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Hm, tired to convert this to a fusion table, but unfortunately, all the graphed tabs are thus lost
https://www.google.com/fusiontables/D...
will have to continue with posting this to the wiki.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
New data available

Thanks to data gathering and publication by Johan and Meechl, the merged fusion table now has the following data.

* FMax (nNo FMN), plus standard error
* FMax (with FMN), plus standard error
* KdOFF, plus standard error
* KdON, plus standard error
* FoldChange, plus standard error
* Free energy estimate for both states
* Free energy difference between the two states
* Dot bracket notation of predicted structures for both states

(The last three items were generated with an extended version of the script Nando provided.)

As the fusion table grows, it's getting a little messy, with some duplicate columns, differing conventions for column names, numbers with ridiculous precision, ... . But right now I'm more interest in making the data available than making it neat. Please let me know if you find outright errors, though.
Photo of Meechl

Meechl

  • 81 Posts
  • 27 Reply Likes
I also was able to add the number and percent basepairs for the second state (hopefully correct ones :), and add them to my spreadsheet here:

MS2

Omei, it'd be great if you could add them to the fusion table too. :)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
Done :-)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
There all kinds of interesting things to be teased out now. Here's just one. I've always been interested in the role that GU pairs play in RNA. Here's a series of graphs for how the scores on the Exclusion labs vary over different numbers of GU pairs in the estimated second (FMN bound) state.

Fewer than half of the designs had any GU pairs at all. Yet those few that had more GU pairs tended to score better. If nothing else, this says to me that we should put more emphasis on higher GU designs to get more data about them.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
can pictures be imported
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
Unfortunately, no. What is possible is to put the images in the cloud and import the URLs for them. That's how it is that the switch charts appear in the table.

The Eterna devs certainly have the capability to put other images up on Amazon.
They do that now with a simple target structure image for each lab:


Perhaps if we were to mock something up that made a compelling case for why a page that incorporates both numerical values and graphs would be a big boon to understanding switches, they would put up more images.
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
I would definitely want an easier way to see through the design, without having to click my way through them. It would be a monster advantage for us actually analyzing some of the ton of data we are going to get back next round. It will even help us big time with designing, as we are also looking up designs for when we design. You have my vote. :)
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
Its going to be hard for us to deploy the images in the in-game seqence browser now, as developer attention is focused on the next generation of puzzles. In particular we're still figuring out what the 'ultimate' data output will be for switches, especially when we have multiple inputs... we don't want to sink time into a visualizer yet.

I like the idea of the players prototyping visualization/ranking tools on Fusion Tables -- basically what you all are doing right now!

(1) For the binding curves (blue and red lines)

How about instead... google sheets allows creation of sparklines

so what if we posted the raw data vs. FMN concentration -- would you be able to then fuse it into one of the current sheets and create lines in situ?

Can someone prototype this -- e.g., by posting an example of showing mini-bar-graphs with, say base pair frequencies? That will then motivate us to post a file with all the data, and then you all can visualize 'at will'.

(2) As for visualizing the RNA-secondary structure in the sheets, not sure what to do... presumably would need .png display? we do have some code to autogenerate the .pngs from secondary structures... somewhere...
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
The sparklines sound like an interesting idea.

Just in case all the fluorescence data are needed, I have posted a second table with all of the median values for MS2 fluorescence at each concentration:

https://docs.google.com/spreadsheets/...

It's harder to follow, but the two experimental series, without and with FMN, are listed with increasing concentrations of MS2. The blue and red curves are generated directly from the fit parameters using the following expression:

F=Fmax [MS2}/([MS2]+KD)
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
yes, while we can hark on the compositional data quite a bit, I miss shape analysis entirely.
What I am looking for, to quantify it (and I'm just waxing here):
1) Number of hairpins and A) Average base length with stdev, B) min length, C) max length
2) Number of bulges and A) Average base length with stdev, B) min length, C) max length
3) Number of loops and A) Average base length with stdev, B) min length, C) max length
4) Onset of first hairpin (early, late, etc)
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
I tried the Sparklines, but I'm not sure that they can plot several X vs Y datasets at once on a log-scale. I'd be happy to be proven wrong, though.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
Rhiju and Johan, will you be supplying the same static switch graph images for the next round? I certainly hope so. In the case of the MFE structure diagrams, I know there is Javascript code to create them, so that could potentially be used to prototype something that dynamically generates the images and combines them with design stats and the switch graphs. I don't think we could do this inside a fusion table itself (no Javascript support), but should be able to do in a Google Spreadsheet or a browser page that uses the fusion table as its data source.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
sailish99, Meechl tells me that she has code that takes the dot bracket notation and identifies base pairings. Structure analysis of the predicted MFE folding is also available using the scripting capabilities of the Eterna UI. Do you know enough JavaScript (or other scripting language) to generate the stats you want?
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
The static switch graph images will definitely be provided the next time around. If you have any suggestions for improvements in these images, please let me know.
Photo of Meechl

Meechl

  • 81 Posts
  • 27 Reply Likes
salish99, I probably could get all that info for you, but it would take time that I should probably be spending on other work. :) I code everything in R, but if you or anyone else would like to look at or borrow any of my scripts, I'm happy to share.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
/omei/Meechl, unfortunately I don\t - but will take at the scripts already around at some point...
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
Johan, before getting your spreadsheet with the detailed stats, I had been assuming that the primary determinate of statistical error was the number of dots that were measured for each particular design. Of course the variance of the luminance measurements would also effect the standard error of the mean, but those didn't seem to vary all that much. Thus, I was making general judgements about the relative size of the standard errors based on the intensity of the dots in the graph.

But now, comparing the standard error values you report against my visual estimates, I don't see much relationship. For example, here are 6 designs that have almost identical FMax (No FMN) standard errors. But they seem to have orders of magnitude differences in the dots representing the data


Can you elaborate on how you calculate the standard errors?
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
I presume it is the square root of the sum of the squares of all the different standard deviations, including though not limited to: difference in luminosity, inaccuracies in concentrations of added compounds, variability in the ability f the tethers to attach well to the substrate, changes in phosphorescence as a function of temporal changes, inaccuracies of all the measurement devices used, exactness of the arrangement of bases in the rna, amount of unintended cross-linking, and maybe a few others.
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Good observation.

For these initial data, I am reporting the error from the least squares fit to the median fluorescence value at each concentration. Since all designs have the same number of data points in the fit, and since they most of the time are close the the curve from the fit, it is normal that the associated errors are similar.

Is this the absolutely best way to estimate the true errors? Probably not. We found that the median is better than the mean, which is less consistent due to outliers that skew the distribution. These outliers don't pose the same problem when taking the median. In the future we would like to analyze the distribution of fluorescence and find even better ways of rejecting spurious signals (i.e., clusters that are not fit properly, bad images etc.), and perhaps correct for some of the sources of errors that salish99 mentions. If the distributions are gaussian we could then to a normal weighted non-linear squares fit that incorporates the standard error of the mean at each point. Another option would be to do bootstrapping to estimate the errors.
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
The aptamer matrix

Lately I have been talking a lot about C and G segments in relation to the MS2 lab and them helping make the switch happen.

Aptamer sequences in green boxes, G and C segments highlighted with blue.


I kept seeing them pop up around the MS2 hairpin, but also around the aptamer and it got me wondering if something similar happened for FMN aptamers in general.

Which color door to take?

The MS2 lab results, sent me down a rabbit hole to the old switch lab data.

Image credit

Try imagine the stems around the aptamer is like a gate. Each with a double door. Each strand a door.

The image above like a pretty solid house. But really the aptamer house is more like a tent with door/s, and a person inside the tent as the molecule.



Some aptamers tents will have two doors because both stems around the aptamer will be active in the switching area, when the aptamer gets turned of.

But aptamers can also have just one gate at either end. The doors will have different colors. So the one door will be mostly green/blue and the other door will be mostly red. But which it will be will depend on different things.

While I was working one question, lead me to the next. I started a spreadsheet and ticked of different options to get a clearer idea what was going on.

Double gated aptamers

The clearest pattern showed up for the double gated aptamers. The aptamers that has the switching area on both sides of it. There the two first door on right hand, going through the aptamer from sequence start, will have a tendency to be mainly green and blue. Especially if there are no multiloops in either states. Most of these switches was also short range moves.


Spreadsheet: Patterns for segments in switching area next to aptamer

Example:




And here is the reason why this pattern happens to the short range switching but all moving stems around the aptamers. The locked aptamer sequence itself is creating the repeat.



Something Brourd has pointed out.

Gate before the aptamer (Switching area before FMN)

Those aptamers that had switching area before they started, were among the higher scoring switch labs. Most of them were also only partial moving switches, which I have earlier pointed out is a far more successful strategy for solving a switch, compared to having the whole switch move. Except if the full moving switch is short in length. Different types of switches

These aptamer gates have a different pattern for first door. Some have green, others have red.

However what do fit is that if the switch moves backward - the aptamer door pairs up with a sequence with a lower base number than it had before - then the door is red. Those was short range switches.

Example of red switching gate, moving backwards.


If the switch moves forward - the aptamer door pairs up with a sequence with a higher base number than it had before - then the door is green. Those were long range switches.

Red door first - aptamer before or after switching area

Also these 3 labs with red door first, stuck out by one thing more. They have their magnet sequence before the aptamer, placed in single base area. 2 of the labs have it in the hook area (Stratospheric and My screw up) and last have it in a multiloop area ring area (Top Notch). All of them having the magnet sequence at short range before the aptamer.

I have earlier mentioned multiloops to be good. I suspect multiloops are actively helping make the switch happen and are good to have in unbound state at least. And some of the best scoring labs had multiloop in both states. I think one may have more options for switching with a multiloop (Top notch) and 2 stems moving, than a big internal loop with one stem moving and a loop area becoming stem. Though the Top Notch lab shows that it is possible.

Green door first

Most of the time the first aptamer door one will go through will be green. Thats the overall tendency.

Gate after the aptamer (Switching area after FMN)

There were only very few switch labs of this type.

Mixed pattern for aptamer door/s

Some labs have mixed patterns for the doors. With doors neither green or red. But I suspect we might still be able to use the knowledge that a certain pattern occur often, while we can not fully predict which color and bases around the aptamer yet.
Now not every switch lab have the aptamer closed by double G’s paired with double C’s or a similar G and C/U heavy strands. For short stems after an aptamer, even lines of A/U’s, can sometimes do the job. However if the design can’t go for the stronger C element, there is often a number of U’s instead, at similar spot.

MS2 segments as door magnets

I am starting to think about the G and C segments that help get the MS2 hairpin moving as magnets. Magnet to catch the door when it swings open in addition to closing the aptamer loop.

So both doors and magnets for switching. And I think this is why the aptamer house with a single door often is closed with eg. two GC's lining up, despite that is a bad choice for something that one wants to get moving. but they are short. Usually 2, 3 or 4 base pairs long. And if 3 and 4, not then not all the base pairs are GC's in line

Machinelves: Physics and chemistry is all about balance and proportion, so it doesn't matter if GC is strong, if there is an equal weight pulling elsewhere

I also think this is why the magnets that are pulling the MS2 hairpin open, usually are a bit longer than those normally needed to open up the aptamer doors. While the one aptamer door and the MS2 magnet, sometimes do mix up to do the task together. As the MS2 sequence haves

So why am I talking about these aptamer doors? Because I think it matters what colors we paint them and it will depend on where the gate are and how many there are. And things like if the sequence near the aptamer is moving backward or forward compared to the sequence it pairs with. And if there are multiloops in one or both states. And this again will determine what magnet sequence to use in the design and where.

I have been seeing something else for the old switches. Which kind of makes sense now. The leading first door in or out or both, is most often green or bluish. That leaves the last aptamer door when walking around along the sequence, red. This door will often want an outside magnet sequence to help it open up. Which can explain that there are often one or several stretches of magnet sequences outside of the aptamer area that are blue and green.

Aptamer doors versus MS2 segments

I have been struggling with getting my mind around this segment thing. Because I saw the G and C segments show up around aptamer sequence and sometimes also MS2 sequence, but they didn’t seem to be working in the same way.

The aptamer has a door made of two doors, that needs to close in on state, but be open in another. But the MS2 hairpin, need to have its G and C segments, if it has both, out of synch. It can't have them both close to the MS2 sequence as they were doors, because then they will rather pair up with each other, and at no condition let out the MS2 sequence to play. So the MS2 hairpin will not open up.

Machinelves suggested they could be swivel doors like in a western saloon - not ever locking closed. And if you can imagine each door being placed far apart not even across each other.

Or instead try imagine that the MS2 hairpin has one set of doors, hidden up inside the MS2 hairpin. Each door can then go on to forming either one or two gates with outside magnet segments.

Often the red segment of the MS2 go far after the MS2 sequence, to sometimes fuse with the red door of the aptamer. Even using the red bases in the aptamer itself, sometimes.

Both FMN aptamers and MS2 sequence has a thing in common. That they both often like to use G and C hook segments. Lines of C's and G's, that they themselves can grab onto, to get the switch moving.

MS2 element is made for switching (Has 3 out of 4 as G’s, has 3 C’s in line)
FMN aptamer element is made for switching. (Has two double G’s in line)
Orange marked area shows where aptamer doors often turn up.



Sum on aptamer gates

I basically see the same mechanism in play both in the MS2 labs for aptamers and MS2
but also that FMN aptamers have patterns in common for how they like to get solved, depending on if the sequence around the aptamer moves backwards or forwards, and depending on which of the stems around the aptamer is in an active switching area.

But in difference to the MS2 segments, that are usually placed apart, as they are generally longer and lines of C’s and G’s love to pair, the G and C segments for the aptamer are often smaller and weaker, and are often pairing as to close the one end of the aptamer. On the side which is involved in the switching.

Most of these double gated aptamers had in common that the whole switch was moving too, which I earlier have mentioned is less than optimal for gaining a high score. Different types of switches

So I wonder what will be best. I think one of the 3 type of aptamer doors will be better than the others. For now the most successful ones are the few ones that starts with a red door and a short range switch and has the switching area before aptamer. I wonder how the aptamers with switching area on both sides, will do in designs that are have a smaller actual switching area? Will they do better, if given better starting conditions? I think I have my bet mainly on single gated aptamers. And for now switching area before aptamer looks best. But when other elements like MS2 bricks needs to get involved, I think I’m in favor of having the MS2 side held between the aptamer sequences. The MS2 lab that had exclusively worst for average eterna score was Exclusion 1 which had the MS2 sequence outside and not in between the two aptamer sequence. Source Omei's screenshot

Acknowledgement

Thx to Machinelves for listening to my lab talk, commenting and helping me visualize and put mental images to this - with gatekeepers, aptamer houses and swivel doors.

Perspective

I didn’t start out on this journey with any particular aim, but the aim turned into being a strive to predicting sequence around aptamers and MS2 magnets. I can’t do this fully, but as I have tried to shown, there already seem to be tendencies for aptamer closing segments in the small set of switch labs we have solved.

Now I pointed out the mentioned patterns turn up in specific kind of aptamer settings and seems to get the switch solved. But are they also indicating that the aptamer will work well too? There might be needed some adjustments to this idea for best possible overall switch, but solid aptamer binding too.

I also think there is something like a good before and/or after MS2 sequence magnet position. Just like with the aptamer house. This will depend partly on position of the MS2 sequence to the aptamer sequences. But I think there will be a pattern for when to need a red or a green magnet sequence before or after the MS2 pin and how close it is to be positioned to the MS2 sequence. Likely also depending on which way the switching area moves.

We still have rather few switch labs, but I suspect that when we get more a clearer picture will arise. Hope it will be a help for a start.

I’m certain there are many more factors to take in and more connections to be gained from from the spreadsheet and with more lab data. You could see other connections than I. Please bring up what YOU see, so we can all get better at solving switches for lab and help each other further science and medicine. Your turn. :)

Resources
Image document: Where is my switching area versus my aptamer
Spreadsheet: Patterns for segments in switching area next to aptamer
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
nice ideas
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Thx, Salish :)
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
based on this magnet/door picture, can you make any predictions? That is, do you see any pairs of sequences that johan is synthesizing in the current FMN/MS2 round where you can predict one design will switch better than the other? [Or will we need to design special experiments in the next round to explicitly test?]
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Hi Rhiju!

I have tried to answer your question in this post below:

https://getsatisfaction.com/eternagam...
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
I would assume that these are the calculated, and not the measured shape data, Eli.
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
We don't have SHAPE data for the MS2 labs. More like florescent green signals. So we don't know the exact shapes. You can read a bit more about the experiment here:

http://eterna.cmu.edu/web/lab/4736274/
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
my point exactly
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Reverse engeneering magnet segments

I just got myself a very good laugh.

I was browsing the topic aptamers on the internet and I happened to stumble on one of my old Eterna bot strategy suggestions on the topic aptamers in static labs. Back then I saw magnetic segments of C's elsewhere in the design, dying to pair with the aptamer twin G's, if either close in sequence or in geometric space. And I very much disliked it.

I was attempting to banning them in the static labs then, for the exact same reason as I now like them in switching designs having aptamers, and/or MS2 hairpins. :)

[Market strategy] Aptamer designs with red twin sequences
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Past strategy resurficed

I found another one of my past strategies. This one was an attempt to ban specific types of base repeats, that I had seen time and time again act like stem splitters in single state labs.

They look suspiciously much like the switching segments I see for the MS2 hairpin and the C/U magnet sections in our past switch labs. :)

Strand repetition ban
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Skipping magnet stones on the sequence sea

Lets imagine that the locked sequence (aptamer and MS2 hairpin) are like skipping stones...


Image credit

I have started to think about the magnet segments, as skipping stones from repeat sequences in locked base area, that get thrown in the sequence water and spread color ripples in a particular direction. Not totally similar to rings in water, more of a ripple with a specific direction. Sometimes the ripple ends up later in the sequence, after the aptamer or MS2. Sometimes the ripple ends up earlier in the sequence, before the aptamer.

The original stone that gets thrown, is the strong repeat bases that stem from the elements, like the FMN aptamer or the MS2 hairpin. The skips are the magnet segments elsewhere in the sequence, that makes the switch happen, by flipping between the complementary/strong bases in the aptamer/ligand and the magnet segment/s outside.

I also suspect that there is a connection between moving direction and which direction it spread its magnet segments ripple in the sequence water.

Magnet segments and ripples

I think there are two things color ripples and magnet segments in a sequence

- Ratio of locked bases.

The higher ratio of locked bases, the more bases are destined to be predetermined.

- Repeat bases in the sequence

Strong repeat bases inside of a locked element, are like optimal skipping magnet stones, for making a switch happen. They will have far more impact - spread more ripples. Especially if there are strong base twins or triplets, like CC, CCC or GG, GGAG. These are the magnet elements from inside the aptamer and MS2 sequence. A magnet segment of Cs will spread ripples of Gs, and a magnet segment of Gs will spread ripples of Cs &/or Us. The latter effect explains the extensive amount of C and U sequence repeats in switch labs with the FMN aptamers.

Here is an example of FMN aptamer lab with a spread of blue/green ripples.

This is a full moving switch - spreading C/U ripples (blue boxes) after aptamer sequence (orange box). C segment echoed forwards (base 31-33)





Here is a partially moving switch - with contained ripples. G segment (base 18-19) echoed backwards once.





To switch globally or locally

Now I think I understand why it works helping to reduce the total switching area in a switch, to not include the whole structure, but only a confined area. By having only a partial switching area, as opposed to a full moving switch, one stops the skipping stone from sending repeat sequences outside the switching area. Thus reducing the amount of already predetermined bases, that are spread due to the locked sequence and the spread of magnet segments.

And I also suspect it matters a good deal which aptamer one puts together with MS2. Because likely a huge part of the the bases are already predetermined due to locked MS2 and aptamer sequence bases. That forces other bases to be predetermined too. However it helps a lot that we are not locked into doing a specific structure in lab, meaning we can change the structure to fit what we think will be most optimal for a solve.

So perhaps even more for the MS2 labs, since I think I counted around 37% of the bases to be already locked. And with repeat in them, repeat sequence is spreading on. Something Brourd demonstrated (here)

Not all aptamers are alike

But not every locked sequence has the exact same effect. Just like skipping stones are not alike. Some stones are wide and create big ripples. Others are smaller and their rings and effect die out fast.

Brourd brought up that other aptamers like TEP aptamer, didn’t create this same pattern of repeats in the puzzles as I found in the switches having the FMN aptamer. (Periodic repeats in RNA switches - How can they be programmed)

The TEP aptamer doesn’t have repeats in it such as the FMN aptamer or the MS2 hairpin.

Aptamer and ligand qualities

Imagine the following:

TEP aptamer - a small stone - has few locked bases. This is a weak magnet for creating repeats, since it does not itself contain repeat bases. However strong bases like Gs and Cs beside it, can still be used to hook or grab onto others with a magnet segment outside.

FMN aptamer - a middle size stone - has a lot of repeating bases. In this, both sequences are close to being a repeat of themselves, plus there are A repeat, and 2 strong twin G magnets repeat

MS2 hairpin - a big stone - has many locked bases. This is strong because of its repeating basepairs. But it contains two very strong doors for switching. The triple C magnet and the GAGG magnet.

So the MS2 hairpin will have strong effects just due to the amount of locked and repeating bases, and it has good options to induce repeat magnet segments elsewhere. (Especially Gs and to some extent Cs.) But while the MS2 hairpin has two even stronger magnet segments inside, they don’t always both come into play.

Whereas the FMN aptamer will have a big impact due especially to the amount of repeats. (Especially CUs.) However, I think the TEP aptamer will be weak on both accounts, magnet segments and inducing repeats.

I consider the FMN aptamer to have the most skipping stone like quality. Even in shape. ;)

The two halves of the aptamer sequence are almost mirror repeats of each other. Then on top of it they contain AA and GG repeats.



MS2 sequence and how it opens

Back to the MS2 department again. Not all designs use G or C magnets to open up the MS2 sequence.

Some magnet segments go for the beginning part of the MS2 sequence, while others go for the middle, and yet others go for the last part of the sequence. And then there are some mixes. Actually I would have expected something else to work for the MS2 hairpin. That one would open the parts furthest away from the loop. But what seems to be strong is going for the strongest bases buried deep down in the MS2 hairpin.

MS2 hairpin sequence


Like Machinelves asked: The strongest bonds stick the hairpin together, but are also able to form the strongest attraction elsewhere to unglue it.

So one can open the MS2 hairpin many ways, but most ways will require longer magnet segments, if one chooses to not use the shorter stronger magnets, like with sequential repeats of Gs and Cs.

I concluded a long time back that the more basepairs one wanted to get switching, the harder it was to make a successful switch. I think there will be a pattern to how many segments one uses, and the length of them. If there are many that are too long, I think it will be much harder getting the switch moving. (Different types of switches)

Color Ripple effect

What I see so far, is that choosing which door color to pick for the FMN aptamer is related to which way the aptamer sequence is moving. The sequence around the aptamer - that is in the switching area - can move forward or backward - actually even both at times - when both sides of the sequence around the aptamer moves.

Hypotheses:
If the aptamer’s top gate sequence moves backwards, then the color ripple could be found earlier in the sequence.
If the aptamer’s top gate sequence moves forwards, then the color ripple could be found later in the sequence.

I am wondering about things like which end of the aptamer has the sequence involved in a switch, and whether the switch happens from stem to loop, from hook to stem, or from stem to stem, etc. Those factors may also have a role in determining the outcome of the hypotheses I posted above.

But basically I see the repeat sequences in both MS2 and aptamer as stones thrown, creating a trajectory of ripples elsewhere. Especially the stronger segments, like the 2x twin Gs in the aptamer, but also 3 Cs in line in the MS2, and the 3 Gs out of 4 bases in the MS2 hairpin. But of course also the non repetitive part of the locked sequence is also spreading repeats. But if one has a strong means for making a switch happen, with a small but strong magnet sequence, this may happen more regularly.

Color ripple interference

So imagine you get two stones thrown. And their ripples intersect. This could cause interference in a bad way. Or in a good way. Or even neutrally.

If two elements are too close, they may not allow for one or the other - or both, to have their most optimal sequences around them. And as such they could inhibit or neutralize each other. Similarly if two elements are too far apart, they may not benefit from each other, the way they could. However with an optimal distance and position in relation to each other, they may enhance each other, by mutually beneficial and shared switch magnets.

Echo calculator

Thx to Machinelves for inventing the term above and for general discussion. She said the following, which I really like:

“Interesting point on interference, you're right about the increasing complexity with each variable introduced. Maybe those effects and complexity itself could be part of the calculator.”

If two elements are too similar, they may cause a wave top or bottom. It may be good and it may be bad. So basically we don't want the ripples to meet in a way that prevents the structure from being solved. Like where the aptamer demands a certain color segment for its closing, but then ripples caused by repeat bases or locked bases from the MS2 hairpin prevent the most optimal base colors from being placed at that spot.

Some of this interference can be avoided by making good structure, but I see these ripples as potentially responsible for what solves can be made, and what structures will work. Having limiting or enhancing effects can in itself be a function of interference.

I think it will matter a great deal where elements are placed in relation to each other, for them to work optimally. For example, the G segment of the MS2 hairpin often likes a placement after the MS2 sequence. And the lab Exclusion 4, which had the MS2 at end of the sequence, did worse on switch subscore. (Source Omei’s screenshot)

It could only have magnet segment in front of of the MS2 hairpin. Losing out on some of the switching power from having segments before and after.

Basically I think that two elements might end up being too similar. It can be good, if they can be made to share a segment, like was the case in some of the MS2 labs. But it may also be bad, if it prevents a solve from happening, because some ripples cancel each other out. So my aim is to look at whether I can identify which situations cause ripples in what direction for aptamers, and similarly for MS2. Since the Feynman prize is about micro RNA too, segments in those will also add ripples. But different ones.

If the elements are placed unluckily in relation to each other, they may cause a larger number of bases to be predetermined than is necessary, and therefore not be fully malleable, for what is most optimal for making a switch happen. But if done right, it may mean that two segments may get fused, or two elements otherwise can support each other well.

I think we can apply these interferences to our advantage. If we understand when, where and why they hit. Instead of being at their mercy.

Big thx to Machinelves for edits and for discussion.
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Switch magnet color coding

The other day I started drawing color code and symbols, for the aptamer and MS2 gate things I been talking about. Because I think symbols and colors are better than spreadsheet to show connections. I have sorted the switch labs after types. Of where the switching area is in relation to the aptamer. (I excluded Will it bind, for being very different to the others, and I haven’t done color coding for our first switch labs.)

Like I already mentioned, some things stands out for the aptamer gates and doors. They often get a particular color.

The door color around the aptamer are mainly the making of the repeat sequence in the aptamer itself, and which part around the aptamer is moving and which way. More than it is what makes the switch happen itself. They still want prefered colors, else they might get grumpy and might come in the way of a switch. But what most of the time really makes the switch happen are the Twin G’s. There are few exceptions, like Alive and kicking. One lab even have both its twin aptamer G’s in action (Anchor).

Magnetic poles

I think I came up with a better image for the magnets segments I have been talking about. They are like a north or a south pole.


Image credit

Lets say that the two pair of twin G’s in the aptamers are north poles. They repel, they won’t pair with each other.


Image credit

So something is needed to close the aptamer, cause it can’t close itself. Thats where the red and green doors comes in.

However the MS2 can close itself, its lines of C’s are like a south pole magnet and the G’s, the north pole. However it needs to open up. And this is where hook magnet outside the MS2 sequence comes into play.

Switching area before the aptamer


Switching area before and after aptamer


Notice the mostly green aptamer doors (beginning sequence left) and the mostly blue aptamer doors after the aptamer (pink line) These labs generally did less well. Moving mostly only short distance - which is what provoked the color pattern around the aptamer in the first place. Their twin G magnets mostly pairing closeby.

Switching area before and after aptamer (Top half) Switching area after aptamer (Bottom half)



Designs with switching area after the aptamer, typically have a blue green door.

MS2 labs



Sum up

When I watch the MS2 labs, what stands out is that the labs that did better for switching and overall, were those that had more connections between their magnet segments.

Exclusion 1 and 4 did worst. Neither their aptamer magnet twin G’s got into play, nor the MS2 magnet C’s. At least for most of the designs.

Exclusion 2 and 3 did better. Each of these two had their MS2 lines of C’s in action, at least in several of the top scorers. Also they had first aptamer door the color as aptamers showed to prefer in past switch labs, for labs that had their switching area before the aptamer.

Why SS1 and SS2 did so exceptional well for the winners, are because, these labs both have their MS2 C’s active and the G magnet in the aptamer. They literally enhance each other for both switching on and off. Because the aptamer G’s pair up with the MS2 C’s when the switch is not supposed to be on. Ensuring both stays off. SS2 even have both of its aptamer G magnets active, beside the MS2 C’s. SS1 has even more magnet segments in play than SS2 and that plus the MS2 sequence is less within reach of the aptamer sequence, by it being first, this might be why this one is doing worse than SS2.

Suggestions for the future

I would love to see a MS2 lab in the exclusion series, similar to that of Same State 2. With aptamer sequence first and last and with MS2 sequence in between. But with the MS2 sequence shifted a bit more towards the beginning of the sequence, than was the case with the Same State 2 lab, where it was right in the middle.

Reason for this. I think if the MS2 sequence goes more to the right instead, then the aptamer magnet and red closing door, will love to pair up with the MS2 C’s. That will get the switch turned off and on very well, but with aptamer and MS2 in same state and not each their state. So having the MS2 sequence distanced from the second aptamer sequence, will help make it less likely that the MS2 C’s end up switching off both the MS2 and the aptamer, by pairing with the 2 red aptamer G segments. But having a bit of space before the MS2, might allow for putting a small C or G segment into play for pulling in the MS2 hairpin also. So opening up the MS2 sequence from both sides. Something which can’t be done in the labs where the MS2 sequence is right next to the one aptamer sequence.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
what is nincom poop?
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Hi Salish!

Its the name of a past switch lab from the first big cloud lab run with 20 switch puzzles. Most of which can be found here:

http://eterna.cmu.edu/web/labs/past/?...
http://eterna.cmu.edu/web/labs/past/?...

And our all time first switches:
http://eterna.cmu.edu/web/lab/3376078/
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Rhiju asked for explanations to some of my symbols in my drawings in the above post.

https://d2r1vs3d9006ap.cloudfront.net...

He asked what the locked symbols meant.

Answer: The lock symbol, symbolises that this side of the aptamer is not involved in the switching area - the part of the RNA switch that moves or is supposed to move

He also asked about that purple line = MS2 sequence, since there was no purple line on that page.

Answer: I just painted a purple line to explain that this purple line represented the MS2 sequence. What happened was, I put explanation on first page. But purple lines only applies to last page with the MS2 labs.
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
@johan/nando -- can we set up additional puzzles like what eli suggests in the next round? that is, have the aptamer defined by base pairing of the very 5' and 3' ends, but make the puzzle an 'exclusion' puzzle for MS2?

@eli, i'll discuss with nando next week the possibility of allowing add/delete of residues so that players can also optimize the distances of each element from each other.

We haven't yet deployed that add/delete feature due to bugs and also because we know it will make gameplay more complicated. But first we need to know what tools advanced players *need* to make these crazy devices and then we'll work on gamifying them and introducing them gently to early players. So keep the suggestions coming.
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
I was looking at the new FMN sanity lab data.

http://eternagame.org/web/labs/past/?...

http://eternagame.org/web/lab/5422399/

The new switch lab results looks very odd compared with the Cloud switches. It looks like we are lacking half the data. There is only SHAPE data for 1 state.

What we got of data for early cloud switches


SHAPE data for both state 1 + 2




What we get now

No SHAPE data for state 2


One can only switch between target and estimate mode in the 1 state. One can't see if the aptamer is bound or not.

I suspect the overall switch score is too high, if this is not an upload fault, but an average between score for 1 + 2 state. Because 2 state is always getting a full 100% score.

I asked Rhiju and it turns out that the lab is a mimics lab. I can see that past mimics labs also get a full 100% score for the state 2 that holds no data.

Is the same mimics sequence used for all these FMN sanity labs?



But mimics are not FMN aptamers

As mentioned in the comment in this post earlier, FMN aptamers spreads a particular pattern in the design.

https://getsatisfaction.com/eternagam...

Even should the mimic have the same energy value as the FMN it is mimicing, it likely won't have the same repeat base patterns - and thereby following affect on the design. And as such it will not be the same aptamer.

To analyse these switch labs properly and be able to compare them with the past switch lab data, we need to see SHAPE data for both states. It would also be most helpful to see them run with FMN aptamer too. That will give us data to compare with our past switches. Plus it ought to help for the mimics comparison too.
Photo of Brourd

Brourd

  • 452 Posts
  • 82 Reply Likes
The goal of the FMNsanity riboswitch targets is to find suitable candidates for use with the mimic protocol. As an example, here is a synthesized sequence from the target "Pentaloop by Hyphema"

http://eternagame.org/game/solution/5...

As any player can see, the chemical mapping measurements indicate with little doubt, that the target containing the FMN aptamer formed in the majority of probed RNA's.

It would be a significant waste of time, energy, and resources to run the mimic protocol or probe with (PLUS)FMN conditions for a sequence like this.

"Even should the mimic have the same energy value as the FMN it is mimicing, it likely won't have the same repeat base patterns - and thereby following affect on the design. And as such it will not be the same aptamer."

The designs that are a part of the FMNsanity targets are indeed the sequences as they would be in (MINUS)FMN conditions. So the secondary structure it folds into is the secondary structure present in (MINUS)FMN conditions.

The goal from this particular set of targets is to pull out those sequences that have been deemed to not fold into the (PLUS)FMN state, and run with mimics. This being a more efficient manner for determining if it is a suitable riboswitch, compared to running multiple sets of difficult and expensive, chemical mapping experiments on 400 sequences hoping for a "lucky hit" when there is a significant possibility that player designs will utterly fail to fold into the preferred (MINUS)FMN secondary structure.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Eli, I think this is the shape data calculated from the model, not measured (right?).
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Hi Salish!

The SHAPE data for this lab is measured, just not for both states.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
ya, thx.
Photo of Brourd

Brourd

  • 452 Posts
  • 82 Reply Likes
To clarify Eli's statement.

The chemical mapping measurements are for both target secondary structures, since the ensemble of RNA is probed. What this round of designs did not measure was both states in the +FMN (Flavin mononucleotide) conditions.

For those looking at the data, the ideal goal would be to observe and determine which sequences fold into the -FMN structure, and which sequences fold into the +FMN structure, without the presence of FMN.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Brourd, is the new data already out, I didn't see it in my experimental sets...
Photo of Brourd

Brourd

  • 452 Posts
  • 82 Reply Likes
Eli's query was referring to the FMNsanity riboswitch candidates, whose sequences were synthesized and sequenced in the 94th Eterna Round.

http://eternagame.org/web/lab/5422399/

Unrelated to Dr. Andreasson's pipeline, if that is what you were asking about.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
wow, what a discussion... nice.
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
R93 results are in!
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Sweet!

Thx for the happy news :)
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Oh, I don't see the scores inside of eterna. Please, please, can we get the scores uploaded there also? :)
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
It's coming as soon as the time zones allow
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Got it. Understand. And thx :)
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Added in the post just below.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
found it - thanks SO MUCH for already putting together the excel sheet, saving me several hours of reading these values off of your graphs. Thanks.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
And, yay, I made #09 (9th highest score - 5498279), and #-12 (12th lowest score - 5477179) overall
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Lol, Salish. Congrats. Love your humor.
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
I see there is a cluster number in the spreadsheet. I'm a bit uncertain of how cluster is defined. Is it about data quality or is it about number of close to identical designs? I can find a much higher number of designs that are very close in sequence, than the cluster numbers in the spreadsheet indicate. Ok, when rereading the lab conclusion, I think cluster here refers to data quality. Confusion cleared. :)

(Lab conclusion for MS2 round 2 for those who have not seen yet: http://eterna.cmu.edu/web/lab/5448678/)

This however leads to another question. What is normally thought of a good enough amount of clusters, for the data to be trustworthy?
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Request for Lab Interface update

To aid the analysis of the many MS2 designs and labs in general, it will help a ton if we can see name and score, when we screenshot a design for explanation. Instead of having to add this afterwards.

What I wish to be present in any lab design is the following:

- Lab name and/or Sub lab name
- Lab score
- Design title

In many labs not all of these are present by default.

Demonstration from labs with different parts of the design information lacking:

Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
+1
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Alrighty, the fun continues.

first, let's take a look at the composite scores and hoe they influence overall soring. No surprise, switching-ability is important:



compared with the Folding and baseline subscores


Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Beautiful, Salish! I love what you have been up to. Keep the graphs and questions coming.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Thanks.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
So, what properties are favorable for scoring?

Interestingly, there is a v-shaped dip in the Delta G values that lead to high scores, and both around -18 and -14 kcal, we see high scores, with very low scores (



for the miR, there is no such dip:
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
in fact, the miR graph looks more like a VERY blurry geographical map of the UK.... but that just as a side note
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Interesting. It may be illuminating to break out the deltaGs (which are just the natural logs of the KDs) by puzzle.

I just wanted to clarify that the miR data for these puzzle should ideally work as negative controls. Ideally they should have the same deltaG (or KD) as the noFMN conditions. I already noticed that for some designs this is not the case, giving us a hint into the specificity of these switches.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
my text was truncated...
"...very low socre ( smaller 80) in between."

ah, I used the smaller than sign - not recommended
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
In other words, for a good overall result, aim for -13.5 smaller than Delta G smaller than -14.5. For an excellent result, aim for -17 smaller than dela G smaller than -18 (kcal/mol)
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
I have a clarification and apologize for potential confusion:
The deltaG in the spreadsheet refers to the deltaG of binding (i.e., natural log of the KD) and not of the RNA folding itself. It is possible that the V shape at high scores simply reflects that good switches have a low and high Kds in their ON and OFF states, respectively.

I believe that someone (Nando?) had a script for actually calculating the predicted deltaGs of RNA folding for the two states.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Ah, thanks for the note, I was wondering why it was so uniform as I remembered some of my design in the closer-to-zero range as well as some -50ish values, this makes sense.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
I will separate them out by lab.
Question: Do you need a separation between K93 and the other (K87ish)?, or can I lump the results together into the six sub-labs)?
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
I think that the R93 and R88 submissions are interesting mostly for seeing if we have learned something as a community (and I think we have!). For the plots, I think that separating by puzzle is already very informative.
Thanks for making these great figures!
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Thanks, Johan. They aren't in orange's origin, so I would normally be skinned for being so lazy, but by putting some standardized effort into these excel, they do become usable... glad you like them.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
And, finally, the Kd values and how they relate to the final score:

Arrows indicate the favourable direction to aim at in molecule designs:





and the fold change
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
What I did not entrley understand is how we could get negative f max values. I marked the area I would not have expected a result in in red in the following graphs. especially, the outlier at -24000 was remarkable, compared to all the other results.

for FMN:



Without FMN:


And only the positive values with shape analysis:




Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Nice plots. This is great.

The negative fmax values are of course errors that probably resulted from the normalization process. I noticed a few of them but preferred to get the data out there for now. We will try to refine the filters in the futures.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Thanks, Johan.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
I am particular interested in why fmax drifts towards lower values when Eterna scores fall underneath 50-60.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Thanks, Johan. While you are at it, could you add the 4 columns with %bases ? Thx.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Looking foward to your comments.

PS: Is there a way to automize the xls table with %C, %G, %A, and %U content? I could calculate it from column R already, but maybe this exists already?
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Thoughts about the new MS2 data

I have started to put my thoughts down on the new data for Round 2.

Thoughts about MS2 - Round 2

Ye who enter here, have a playful heart.

There will be more questions than answers. And as usually I’m aiming more for trying to gain an understanding than getting everything right in the first place.

Hope you will enjoy and please share your thoughts on the results also. Thx to Salish for getting us well started.
Photo of Eli Fisker

Eli Fisker

  • 2236 Posts
  • 495 Reply Likes
Switches, unbalanced energy and multiloops

I think there is a magic range for energy in multiloops in switches. In single state, multiloops like to have much negative energy in the loop itself, most often have all closing GC pairs, and also often have them orientated a certain way. Static multiloops like their energy low and most negative, although only to a certain extent. They can get also get too negative.

But a multiloop in the switching area of a switch, it can't be too stable, it needs to move. I have been speculating that the multiloop/s energy needs to stay in a fairly positive area or if negative, only slightly so, for the multiloop to help the switch move.

Despite we can’t see actual results per base, I do now think I see more indices pile up for what I have been been looking out for for a long time.

I see monster positive multiloop energy in the switching area. :) Ok, some of them are barely on the positive side. But many are hugely positive.

So switch multiloop most often don't like all GC pairs. And those closing GC pairs it got, regularly gets reversed to go opposite orientation of what is often favorable in static designs. So a switch multiloop will rarely have all GC pairs, but Au's and Gu's mixed in as well for closing pairs. At least if it wants to be in a high scoring switch :)

I have long been suspecting that having multiloops in 1 or two states were actively helping the switching. I think switches benefit from having a multiloop in the switching area. And it actually seems to be the case. From what I see till now.

I also wonder if there is a pattern to which multiloop if a switch has multiloop in both states in the switching area, will have the most/least energy? I think there is. For the SS2 labs it looks like if there are multiloops in both states, that energy in state 2 is more positive than state 1.

Example from Vinnie's design Atgexeedri:



What energy balance there needs to be between multiloops in two states, may also turn out to depend a bit on the design structure.

Spreadsheet: Multiloop energy between states in SS2

So my recommendation for future switch lab designs, is quite the opposite of what I recommend for single state designing: Keep your multiloop energy towards the positive side. :)
Photo of Brourd

Brourd

  • 452 Posts
  • 82 Reply Likes
Using the free energy of 3+ way junctions (3WJ), aka multibranch loops, as a reference point in design is a slippery slope.

Case in point, ViennaUCT uses the ViennaRNA 2.0 software suite and the Turner 2003 parameters, which give significantly different free energy values for 3-way junctions.



3WJ almost always have "positive" free energy with these parameters, regardless of the closing base pairs.

What you may actually be observing within the confines of the RNA secondary structure and sequence is the innate difference between 3-way junctions and 2-way junctions (Internal loops). When the RNA secondary structure is designed to switch between a secondary structure that results in a 3+way junction, and a secondary structure that results in an internal loop, previous data has indicated that the internal loop is the preferred structure of the ensemble.

Granted, this observation is based on a very limited number of targets, but perhaps we may extrapolate a hypothesis from these results:

"Given an arbitrary pair of targets for a riboswitch, on average, we expect those that do better in terms of 'eternascore' to switch from like secondary structure to like secondary structure."

This hypothesis states that successful riboswitch behavior will more likely than not involve similar secondary structure elements switching to similar secondary structure elements, since the free energy calculations will be similar across targets, and independent of unknowns present within the tertiary structure.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
May be worth running a cloud lab type experiments on, say, the 10 (or 20) best designs to see what their actual shape looks like? Just a thought.
Photo of Brourd

Brourd

  • 452 Posts
  • 82 Reply Likes
While the data would not directly reflect that derived from Johan's pipeline due to the lack of the MS2 protein and FMN, for the sake of curiosity, it would probably make for an interesting experiment.
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 123 Reply Likes
Brourd -- thumbs-up on your hypothesis about 'preserving' topologies to make designs more robust to modeling errors. Very interesting.

salish99 -- big thumbs up on your idea. will talk to dev team about converting some or all of the cloud lab to testing designs that are vetted through the RNA array platform and then upranked by players-- would probably start happening this summer.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
Thanks, Rhiju - good to see this will be done, looking forward to the results.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
alright, ghere the Delta G values, separated by experiment:

very interesting to observe the clustering.
At high EteRNA scores: SS1, SS2, at Delta G (switching ) of -17kcal/mol
and Ex1, 2, 3 in the vicinity of Delta G (switching ) of -14kcal/mol

More later.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
To proove a previous point by either Brourd or Johan, here the dG no FMN.
And, yes, if the KD off is high then the KD on is low, leadiong to a good switch and high points, as proven in the comparison between this graph and the last one posted.

I wish there was a way to attach files to this forum...
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Great. Beautiful plot!
The two peaks seem to correspond to the two different types of puzzles.

On another note, I calculated the dG in units of kT for the MS2 binding and forgot to convert it to kcal/mol. Sorry about that. I'll try convert it or indicate it better in the future.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
ah, thx Johan
Photo of nando

nando, Player Developer

  • 388 Posts
  • 71 Reply Likes
@salish99:

These last 2 graphs you posted inspire me following thought: maybe we should model the MS2 hairpin with a binding bonus, probably in the ballpark of -2 kcal/mol.

This would be easily done with Exclusion-type targets, but unfortunately, the current Flash code can't model 2 molecules in the same state, so we can't do this for the Same State ones...
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
excellent idea - would -2 be suitable, or should this be closer to the binding bonus, I wonder?
Photo of nando

nando, Player Developer

  • 388 Posts
  • 71 Reply Likes
It seems that the peaks of the clusters are separated by about 4 kcal/mol in these graphs. Taking into account that the MS2 binding is predicted to "help" in the Same State targets, and play against the switching in the Exclusion ones, it seems reasonable to use half of that amount in absolute value.

Incidentally, the idea is not mine. I recall that modelling MS2 with a bonus of -2 kcal/mol was mentioned in a conversation I had with the lab folks. So I believe that they are aware of the "problem", and I guess that it was left for later consideration because of the game implementation issue I mentioned above.
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
sounds worth trying
Photo of salish99

salish99

  • 295 Posts
  • 58 Reply Likes
has this been implemented by now?