Area to discuss Ways to analyze Lab results

  • 3
  • Idea
  • Updated 1 year ago
share ideas for how to analyze Lab synthesis results.  Please add your ideas or your comments/critiques on how this idea may be improved.


Here is a what I am thinking about doing after Lab results are released.

Goal:  Understand pattern changes that create the most structural enrichment

Method:

1.  For each category, find design pairs with highest structural enrichment

         a.   Where structural enrichment between 2 designs = (chg in Lab Score) / (chg in #nt’s)

2.  Evaluate structural changes between these designs

3.  Look for commonalities across both categories and different pairs with max enrichment

After I do step 1, I will share in this forum what pairs show greatest enrichment.  Then anyone can comment on what patterns they see or insights that arise.
Photo of Gerry Smith

Gerry Smith

  • 63 Posts
  • 29 Reply Likes

Posted 2 years ago

  • 3
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 966 Posts
  • 304 Reply Likes
Gerry, I think this is a great initiative!  I know that I, and others, have looked at pairs of designs that stood out in this way.  But to the best of my knowledge,no one has ever started with a systematic search for all interesting pairs.

If your analysis tools support it, I'll suggest an enhancement for how you measure the difference between two designs.  Counting the number of changed bases is essentially finding the distance between the sequences using the Hamming distance.  This is useful for quantifying base mutations, but doesn't take into account insertions or deletions, which are very significant factors in RNA evolution. The Levenshtein distance (aka "edit" distance) reflects both.  That is, just as a single base mutation adds 1 to the distance, so does a single base addition or deletion.  As an example, if you have an 8-base paired stem and you shift one of the strands by one base, it will add as much as 9 to the Hamming distance.  But it will increase the Levenshtein distance by at most two -- a deletion at one end plus an insertion at the other.

Since a rerun of the puzzles from Round 2 of the OpenTB challenge is the next lab experiment, It would be fantastic if you were to start this off with the Round 2 data!
Photo of Gerry Smith

Gerry Smith

  • 63 Posts
  • 29 Reply Likes
Just what I was looking for Omei.  I copied the Levenshtein distance formulas so they work in my excel, so should be good to go.  Thanks.
Photo of whbob

whbob

  • 190 Posts
  • 57 Reply Likes
The R105 (OpenTB round 2) labs AB/C^2DEC sublab scored very well. The AB/C^2INC sublab was horrible.  It would not switch.
Now the R104 and R106 AB/C^2 DEC and INC sublabs scored INC in the 80's and DEC in the 70's. Fairly well balanced. 
The difference in the rounds was that R105 used different A,B,C & reporter oligo's.  I think that shifted the balance to favor the AB/C^2DEC sublab and make the AB/C^2INC sublab not switch.
The DEC sublab has the reporter with the TB-C oligo's. The INC sublab has the reporter with the TB-A and TB-B oligo's.  
Looking at the data, in the INC sublab, KD100C should be a higher number than KD100A or B for high scores.  
In the R105 labs, AB/C^2INC had a KD100C number lower than KD100A or B.  What made KD100C so low?
Knowing how KD100A, B and C are calculated would help me understand what is happening.
In the R105, AB/C^2INC sublab, TB-C folds on itself with 3 base pairs as a stem when not binding.
In the R106, AB/C^2INC sublab, TB-C folds on itself with 5 base pairs as a stem when not binding. 
Does self attraction play any part in the KD energies?
Is it the relationship of TB-C to the reporter that affects KD100C? Can the lab shed any light (no pun intended) on this :)
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 122 Reply Likes
i agree with whbob that the stark difference between DEC and INC in R015  is a major mystery. we'd love to solve it -- having both DEC and INC calculators in the test would lead to more robust TB diagnoses.
Photo of albermar

albermar

  • 1 Post
  • 0 Reply Likes
I like it 
Photo of Zama

Zama

  • 33 Posts
  • 9 Reply Likes
I'm not sure this is where to ask- but here goes. I'm new to the TB puzzles and they are tough! I have a solution for the first one A*B-RO,  but haven't a clue if it's worth submitting. I shoved all the balancing to the center and left both ends for static stems and the like. I seem to have quite a bit of leeway on what to make there. Am I better off putting some spaces between the two little stems at the top end? Is the bottom stem too long. Should I keep A and B further apart? I included a picture of my graph work, which likely doesn't make sense but will at least show how I have things shoved in the center.  Any help or opinions would greatly be appreciated. Now back to my wrapping. 

Photo of Zama

Zama

  • 33 Posts
  • 9 Reply Likes
Gerry, It must be the Dane in me-lol.  
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 966 Posts
  • 304 Reply Likes
Zama, I really don't know what is best -- that's what the experiments are for.  I do think that arranging for adjacent stacks will, more often than not, improve the switching by lowering the energy of that state and help suppressing unwanted foldings.  But it is also possible that lowering the energy of one state tips the energy balance toward that state to the point where the RNA always folds that way, whether or not the input molecule is present. But that is very informative too, because pairs of designs that are very close in sequence but different in behavior are the easiest to interpret. This is why I encourage you (and anyone else) to submit "interesting" variations with each design.  And right now, I think variations in distance between stacks is one of the most interesting things to be investigating, because we know that the folding engines we are using do not account for adjacent (or very near stems) properly.
Photo of Zama

Zama

  • 33 Posts
  • 9 Reply Likes
How about a MS2 at each end? I can't believe it stayed solved!
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 966 Posts
  • 304 Reply Likes
In the OpenTB labs, there will be no testing with the MS2 protein, so these just become just an ordinary static stem (with a 1-bulge).  My initial expectation is that there would be no significant difference between these and some other hairpin of comparable strength.  One thing that might happen, though, is that something in the rest of your sequence might, by chance, form a stack with some of the bases in the "static" stems, resulting in some level of "misfolding".  I usually check the dot plot after I have created a static stem, and adjust the stem sequence if there is any evidence that it might cause a mis-fold.

But one more thing to consider is that there is nothing wasteful about submitting two similar designs that vary in a way that you don't think matters, as long as you make sufficient notes so you can go back after the results are released and verify your belief.  Serendipitous discovery plays a key role in science.
Photo of Zama

Zama

  • 33 Posts
  • 9 Reply Likes
Thank-you again! 
Photo of Gerry Smith

Gerry Smith

  • 63 Posts
  • 29 Reply Likes
For review and comments:

Here are two files with Top 200 "Enhancement Score pairs" for TB Round 2 105 A*B/C^2 Dec and Inc.  In these files, I have used the change in eterna lab score between the pair for the numerator in Enhancement Score.   A couple people have indicated that some sort of fold change might be more useful - so we can redefine "Enhancement Score" (and keep redefining it as better measures arise).

The denominator in Enhancement Score is the Levenshtein Distance (thank you Omei for this and for some many other helpful comments and corrections), which is a cool measure of similarity between string pairs.  Levenshtein Distance is the number of changes needed in one string (either additions, deletions or substitutions) to convert it into the other string.

The purpose of these files is to provide another useful easy way to look at how structural changes between two similar pairs affect Lab results.

There are many ways to group these pairs - which will make looking for structural pattern changes easy.

This analysis is easy to re-run with different variables. 

Link to TB Round 2 105 A*B/C^2 Dec
https://docs.google.com/spreadsheets/d/1Lt0ZbN1NYtoQTJzjlpNZurPn2O2cjp2j8NE9yy60mFM/edit#gid=2098394608


Link to TB Round 2 105 A*B/C^2 Inc
https://docs.google.com/spreadsheets/d/1zYJ4uWgsnJFRyyDU-n2hTqyqvAphl8HU-kSJijcYubs/edit#gid=455436096
Photo of whbob

whbob

  • 190 Posts
  • 57 Reply Likes
Looking forward to viewing your files, but the doc,'s are not allowing me to view them.
Photo of Gerry Smith

Gerry Smith

  • 63 Posts
  • 29 Reply Likes
I think I corrected for link sharing.  Try now.
Photo of Zama

Zama

  • 33 Posts
  • 9 Reply Likes
Works for me- THANKS!
Photo of Gerry Smith

Gerry Smith

  • 63 Posts
  • 29 Reply Likes
Here are files for the Top 300 Global Fold Change Enhancement Pairs for both Inc and Dec of A*B/C^2 TB Round2 105.

Dec is based on highest Global Fold change Off. 
With Inc pairs based on Global Fold change On.

Also Histogram links have been added.

A*B/C^2 Dec
https://docs.google.com/spreadsheets/d/1ndNdfLsNXk0QgHBHdTBhB2g80OpKWrO5Atf3_UnDU3M/edit?usp=sharing


A*B/C^2 Inc
https://docs.google.com/spreadsheets/d/1qosmEuQXnPRKmLoDZZq8pRHXbnGyvtvqdqPP_usJtfQ/edit?usp=sharing
Photo of Atanas Atanasov

Atanas Atanasov

  • 42 Posts
  • 15 Reply Likes
Is it possible to group/cluster the designs by similarity this way?

When doing analysis, it is hard for me to get an idea of the statistical significance of a theory if I can't count the number of distinct designs that are there in the results data. To get an idea of the problem, imagine the extreme - imagine that all the submitted designs be a mutation of one single design. Then you have tons of data but any hypothesis you make might be only relevant to that single original design. So you get the impression that your hypothesis works for 100% of the designs, but that 100% is of 1. What I'd like to get is some way of clustering similar designs, so when I run a hypothesis I can see how different design-clusters behave. Also this would give a clearer picture of how many designs we actually produced rather than mutated.
I think I saw once a diagram that tried to depict this (maybe in some video from eternacon?).

I prefer to work on the data from Round 1 as I think there are more unique designs vs mutations (as most people don't know what works and are more keen to go from scratch than mutate an existing one).

Maybe one way is to define a mutation of a design as a design at distance below some constant. Then to discover a cluster, you pick a random design and add its mutation to its cluster and repeat a few times recursively for the members of the cluster. Then repeat the cluster discovery procedure until no designs are left that are not part of a cluster. 
Photo of Gerry Smith

Gerry Smith

  • 63 Posts
  • 29 Reply Likes
Hi Atanas -

Yes clustering analysis should be possible.  As I read your comments, I was trying to remember a self clustering method I had come across several years back.

I'm new to this sort of scientific analysis...just really started earlier this month.  Omei recommended R Programming (so I had my son teach me that a couple weeks ago), and Omei also explained the Levenshtein distance function (which my son also showed me how to use via R function).  Omei has kept me on the tracks and, a couple of times, has gotten me back on the tracks in this project.  And Eli always gives me much more than I can fully comprehend. 

There are a great variety of R functions available.  I will look into clustering functions and we can also PM each other on eterna to see if I can create something that would be more useful for you.

Gerry  
Photo of Atanas Atanasov

Atanas Atanasov

  • 42 Posts
  • 15 Reply Likes
I've tried some naive clustering with the Levenshtein distance. It looks promising for our data set (and the typical mutations that players do). It is kind of slow for large sets, for Round 1 results (10000 designs) it takes ~ 10 minutes to complete.

You can have a look at the results here:
R104 results
R105 results
The second column called ParentDesignId is the ID of the cluster (and is also the smallest design ID of the designs in that cluster).

I consider mutations of a design to be:
- designs at distance less or equal to 5
- mutations of mutations

I'm open to suggestions for improvements.
Photo of Atanas Atanasov

Atanas Atanasov

  • 42 Posts
  • 15 Reply Likes
Here is some speculation for the pair 7241098 vs 7242010:

Take this with a huge grain of salt!
From the histograms it looks like there is a shift in the KD in every experiment where the C oligo is present and no shift (or very small) where the C oligo is absent.
This leads me to believe the change is affecting the binding of C with the design.

Then I ran the design in NUPACK with just the design and no other oligo.
When I run it using the energy model from 1999, the two MFE structures seems to be different.
You can take a look:
With A:

With C:


The ensemble defect is 25% vs 40%, which means the with-C structure depicted above is less probable than the with-A structure depicted above.
So it seems base 1 has a higher probability to bind to base 42 (which is U).
And probably the more important part is that the regions from base 1 to base 15 and the corresponding paired region 29 to 41 depicted above has a higher probability of binding.

If you look in Eterna, the binding sites for the participating oligos are:
R - 2 to 14
C - 17 to 26 and 28 to 37

So I speculate that having A on position 1 is making it harder for the C-oligo to bind to its place because 28-37 has some overlap with 29-41.

Unfortunately my attempts at building a hypothesis that holds for all designs are failing. To say it in a simpler way - there are probably example where you would see  a similar picture to the one above and there would be no shift in the experimental data. So take this with a grain of salt.
(Edited)
Photo of Gerry Smith

Gerry Smith

  • 63 Posts
  • 29 Reply Likes
CORRECTION:

The second histogram link is incorrect in the shorter note on where the U is better than A.

The correct histogram link is:

http://s3.amazonaws.com/eterna/labs/histograms_R105/7230655.png

thank you to both Omei and Eli catching this.  Sorry for the confusion and I will try to triple check my links in future!
Photo of whbob

whbob

  • 190 Posts
  • 57 Reply Likes
I downloaded Omei's spreadsheet with Central Fold Change data added as a csv file. Using my local spreadsheet, I grouped the KD data together so that concentration conditions listed in the 3D graph 1 to 16 were together.  Conditions 1 to 7 are in yellow on the left. Conditions 8 to 16 are on the right. The light yellow and light blue are the conditions used for just Central Fold Change. The entire yellow to blue conditions (1-16) are used for Global Fold Change.
Cells with Red borders are the OFF fold changes, Green are the ON fold changes.

The image below is of AndrewKae's better Global Fold Change designs.
The 1st yellow column is KD100nm_C alone.  TB-C, by itself in solution, is the best at letting the reporter shine.  TB-A is best at making the reporter not shine.

I set this up manually and it was a slow process.  If anyone has suggestions on how to automate this process in any way, I would welcome their suggestions.

In order for the INC puzzle to work in the same way, TB-A or TB-B would have to perform like TB-C did in this puzzle. TB-C would have to suppress the reporter.

Presenting the lab results in this way has allowed me to understand the reasons why Andrews solutions have done so well. 
  


Photo of whbob

whbob

  • 190 Posts
  • 57 Reply Likes
Sorry Gerry, I wanted this to be at the end of the all the posts, but it somehow got tagged to the end of your post. 
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
whbob, I love what you did with the colors and the column sorting. This is helpful!

Plus your explanation nails it.

"TB-C, by itself in solution, is the best at letting the reporter shine.  TB-A is best at making the reporter not shine."

To highlight his points, I dragged the columns for both the A, B and C input at the same 100 nanomolar concentration over beside AndrewKae's designs.

   
(Edited)
Photo of worseize

worseize

  • 29 Posts
  • 12 Reply Likes
 
Pictures always was my best type of data to learn rules or find pattern rules , here is one example of best lab designs. I think it is fastest way to understand data
Photo of Gerry Smith

Gerry Smith

  • 63 Posts
  • 29 Reply Likes
Oddball1 (I will name these so they can be identified)

Round 105 A*B/C^2 Dec

Notes: all these designs are highly scoring Global Fold Changes off.  There are no structural changes in any of these states.

When nt4 = U, as in this first pair, I think I understand why nt1 as a U is better than G.  U repels against nt4, rather than being attracted to it...right?  And GFC off is 2.59 vs 2.03, improvement of .56

But if that is the case, how do you explain this....

When nt4 = C (with all other nt's being the same as in the previous pair), why is nt1 better as a G than U?  With GFC off is 2.81 vs 2.36, improvement of .45.

I would think that a G and C in proximity would be be more destabilizing to the dangle....


FIRST PAIR (where nt4 = U and nt1 is better as U than G)
http://www.eternagame.org/game/solution/7113332/7230679/copyview/
http://www.eternagame.org/game/solution/7113332/7230668/copyview/

First pair histograms:
http://s3.amazonaws.com/eterna/labs/histograms_R105/7230679.png
http://s3.amazonaws.com/eterna/labs/histograms_R105/7230668.png

SECOND PAIR (where nt4 = G and nt1 is better as G than U)
http://www.eternagame.org/game/solution/7113332/7230690/copyview/
http://www.eternagame.org/game/solution/7113332/7230694/copyview/

Second pair histograms:
http://s3.amazonaws.com/eterna/labs/histograms_R105/7230690.png
http://s3.amazonaws.com/eterna/labs/histograms_R105/7230694.png
(Edited)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 966 Posts
  • 304 Reply Likes

Gerry, these are great questions!  I’ve looked at many of your pairs, and there are both similarities and differences in how I interpret the results.  Trying to organize all my thoughts coherently has not gotten to anything written down, so I’m going to just choose one and talk about it.  

Let’s consider the pair AK1.1 vs AK1.2, e.g. IDs 7230668 and 7230678.  For convenience, I’ll repeat the links you provided.

Game URLs:

AK1.1: http://www.eternagame.org/game/solution/7113332/7230668/copyview/

AK1.2: http://www.eternagame.org/game/solution/7113332/7230679/copyview/

Switch Graph URLs:

AK1.1: http://s3.amazonaws.com/eterna/labs/histograms_R105/7230668.png

AK1.2: http://s3.amazonaws.com/eterna/labs/histograms_R105/7230679.png


The sequences are


with the only difference being in base position 1.  The mutation from A to U shifted the global fold change from 2.03 to 2.59.

Let’s compare the switch graphs to get a fuller sense of what changed.


The 3D graphs don’t show any obvious overall shift in coloring.  Looking at the KD curves, I can see the small increase in the gap between low and high KD curves (highlighted with large red dots. ) But the KD curves are calculated based on values over the full range of reporter concentrations, and this can make it difficult to identify a basic mechanism causing a shift in KD.

Only recently have I come to appreciate the value of looking closely at the FMax values.  The FMax values have straightforward interpretation.  In principle, they represent the average amount of reporter bound to each design under saturation conditions.  Saturation conditions means the binding has reached its limit -- adding even more reporter to the mix won’t increase the amount of reporter bound.

FMax values are scaled so that 1.0 will correspond to the “normal” case of the reporter being fully bound to the design, i.e. where all the reporter bases are paired, with no mismatches or gaps, with the design.  But “normal”, is not really well defined.  Instead, it is (usually) empirically determined as part of the experiment, using “control” designs that don’t have any known peculiarities.  So small deviations (say in the range 0.9 - 1.1) in FMax may not mean much.  But variations beyond that can be very meaningful.

In the case of AndrewKae’s A*B/C^2 DEC designs, notice that the FMax values fluctuate around 1.5, rather than 1.0.  A consistent value this high almost assuredly means one thing -- there is a second binding site for the reporter.  If this second binding site was intentionally designed, the FMax values should be centered around 2.0.  An intermediate value of 2.5 suggests that the secondary site was not intentional, and even under saturation conditions, binds only about half as much reporter as a “normal” binding site.


Sliding the reverse reporter sequence along, the secondary binding segment jumps out.


This is definitely a less strong binding, but it is being reinforced by coaxial stacking with oligo C, so high concentrations of oligo C should increase FMax.  (The details of the stacking relationship in this case is complicated, but probably enhanced, by the presence of the aromatic fluorophore tacked on at the 5’ end of the reporter.)  On the other hand, oligo B is competing for the secondary reporter site, so higher FMax would be expected to be associated with lower concentrations of B. Oligo A concentration seems to have only an indirect effect via it’s cooperation with oligo B.  So the expected ordering of concentrations to maximize FMax is [B] ≤ [A] < [C].  The reverse order, [B] ≥ [A] > [C] should lower FMax.

This matches the observed data well. I have marked the outlying FMax values for AK1.1 with red rectangles.  The three conditions with high FMax are 4 (B=0, A=5, C=100), 6 (B=5, A=5, C=100) and 10 (B=25, A=100, C=100).  The one condition that stands out on the low end is 13 (B=100, A=100, C=0).

So now we have a reasonable explanation for the range of FMax values, but it doesn’t address Gerry’s original question of why mutating base 1 from A to U improves the global fold change, or for that matter, why it affects it at all, given that base 1 doesn’t seem to pair with anything.

Well, a good part of the reason I started by calling attention to the FMax values of around 1.5 instead of 1.0 was to introduce the notion that quite often in the OpenTB puzzles, oligos can will bind in more places than intended.  To evaluate what could be happening with that dangling 5’ end, we need to see what oligo might find a secondary binding site there.


The strongest binding at the 5’ end is shown below.


At first glance, the 5-base binding between the design and oligo B appears to be only moderately strong.  But closer examination shows that with high reporter concentrations, it becomes supported by coaxial stacking with the reporter.  So conditions with high C concentrations should have their FMax boosted.  


Now consider what happens when base 1 is mutated to a U:



The oligo C stack is extended all the way to base 1.  In Johan’s array experiments, our RNA designs are flanked on both ends by strong double stranded DNA, with the result that the first two RNA bases essentially become a continuation of the DNA stack, and the Tether/Oligo C/Reporter combination become one contiguous helix.

This is a very compelling argument for why the A→U mutation at base 1 has a significant effect.  Unfortunately, it gets ambushed by the experimental results.  While the strong combined stack does seem to “tighten up” the design in the sense that the curves for the various conditions become more parallel, it incorrectly predicts the effect on FMax.  Instead of generally raising the FMax values for AK1.2, it lowers them slightly.  

So I have missed something. It probably means I have left out an important aspect of the RNA interaction.  Or it could be I simply flubbed the logic of my reasoning.  But in any case, this is where the story ends for tonight. :-)

Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 966 Posts
  • 304 Reply Likes
"Does the reporter bind with A & B, or does it bind with 2 C's" may not be the best way to frame the question. In the best DEC designs from Round 2, the reporter did not have a "direct " connection with any of the oligos.  Here, for example is a summary of Andrew's top scoring design.


Andrew's current work, A*B/C^2 Design For Both DEC and INC is based on first finding a good "neutral" switch, one that cleanly switches between two distinct foldings at the OpenTB ratio, but that isn't strongly biased between INC or DEC.  Of course we don't have results for those yet, but I think that line of investigation is very promising.

FWIW, what I think we most lack right now is a more extensive analysis of the INC puzzles, comparable in detail to the attention we have given to the DEC puzzles. 
Photo of whbob

whbob

  • 190 Posts
  • 57 Reply Likes
Thanks eli & rhiju for your interest.

Rephrasing the question, will the reporter be allowed to bind in the presence of A,B or C? 

  
This modified spreadsheet has helped me to see why the R105 lab reacted the way that it did.

KD100nM_C added to the solution did not restrict the reporter from producing a very small ( bright) value. TB-C (value 0.4) does not restrict the reporter much at all.
Not seen here,  KD100nM_B does not restrict the reporter either. It's value is 2.34.
TB-B, however, KD100nM_A does restrict the reporter from illumination. KD100nm_A is a very dark 91.86.

So, regardless of an INC or DEC puzzle, using the R105 inputs and reporter will not restrict illumination if TB_C is present, but will restrict illumination if TB-A is present.

This seems to be independent of what design sequence may be used. 
(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
@Rhiju asked: "That's the difference in the two types of puzzles -- does reporter bind with A& B, or does it bind with 2 C's. Any thing you can hypothesize about reporter attractions with A,B, and C?"

@Whbob rephrased the question: "Rephrasing the question, will the reporter be allowed to bind in the presence of A,B or C?"

I'm biting on that...


Enhancing a reporter bind



The best way of getting a reporter bind, is to have it next to the input that needs to bind too. It enhances the bind of both the reporter and input. (Likely cooperatively)

Binding of the reporter depends very strongly on proximity of the reporter to the input in question that needs to bind.

If the reporter is next to an input (coaxial), the input and the reporter is generally going to bind. Omei realized coaxial stacking being beneficial.

A way to illustrate this point are the acdec and bcdec labs. These labs have two inputs each. Plus a reporter. The reporter is only going to be present in one state.

The labs solves in this way:

  • State with the reporter + input = input + reporter next to each other
  • State witout the reporter binding = second input as far as possible away from reporter + the first input.
The second input is long enough to bind on its own and is stablized by a coaxial switching stem beside it.


Background post on these labs:

Rocketdog and an experiment
Input order - depends on if it is an ON switch or an OFF switch



Getting rid of a reporter bind


There are several ways to get getting rid of a reporter bind,

  • Keep the reporter on its own far away from inputs to prevent it from binding in the first place.
This is not going to work if you put the reporter smack 5' or 3' as it will then coaxial stack with the DNA teathers on the outside. In other words, putting the reporter in the middle of the sequence is less beneficial for reporter bind. There are ways to override this if you need a middle reporter position. Like having coaxial switching or static stem(s) beside.
  • Destabilize the input it sits next to - can be done by lanesharing. So that the input next to the reporter partially share landing site with another input. Then when raising the concentration of the other input, this will tip off the input next to the reporter. When the reporter is next to a loop sequence - aka not coaxial stacking - it will cease to bind well. (This will give a weaker reporter knockout)
  • Longrange single base turnoff sequence (Also weaker knockouteffect)
  • Strong reporter knockout - have the reporter complement attracted to closeby sequence in one of the input complements.
Andrew’s Key: Reporter turnoff proximity matters

So basically which input complement the reporter complement needs to get turned off against, will depend on which input the puzzle need to have absent in what state. This will differ depending on the puzzle.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
One more thing. In several past hard dec and inc solves, the C input has stayed on in the last state where the A and B input needs to bind. (With or without the reporter)

In the hard inc puzzle some C inputs were coaxial to the A or B input. Since the C input is always present in a fairly high concentration, having it coaxial to other inputs, I can imagine only that it will enhance their bind too.

Just as a single input next to the reporter enhances the reporter bind.

Here is a puzzle example from one of the better hard inc designs


http://www.eternagame.org/game/solution/6892317/7001278/copyandview/

The above puzzle uses a mix of two strategies. One where one tips the balance between two inputs by having them share lane and another where the first C input is close to coaxial stacked input A. All of the inputs binding in state4 are almost coaxial stacked with each other. There is one long train of turned on inputs.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
@Whbob, what you said makes real good sense:

"KD100nM_C added to the solution did not restrict the reporter from producing a very small ( bright) value. TB-C (value 0.4) does not restrict the reporter much at all."

I know you made your modified spreadsheet over AndrewKae's best designs. I took a look at the first one in the spreadsheet. (7233512) I know that the other of AndrewKae's best design follow the same main template.

Here is a visual of it, with the C input highlighted and reporter highlighted.

    

http://www.eternagame.org/game/solution/7113332/7233512/copyandview/

The reporter is absolutely closest to the C input. While the reporter is not coaxial stacked, it is still close enough that it lights up big time, just with the C input present.


Extra in relation to solving the puzzle

Plus the reporter is a good deal away from the A and B input that needs to bind in state 4 without the reporter. Same principle as mentioned in one of the post two posts above this one. Keep the reporter distanced from whatever input that needs to bound without having the reporter around.
(Edited)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 966 Posts
  • 304 Reply Likes
 Recently I have become very intrigued by the potential of analyzing the OpenTB results by focusing on FMax instead of (or perhaps in addition to) KDs and fold change. Compared to KDs and Fold Change, focusing on FMax has a number of advantages:
  • It is more easily understood,

  • It is more easily measured,

  • It relates more closely to what a paper-based diagnostic can directly display, and

  • It appears that there is a straightforward way to control it when designing.

Sounds pretty good, huh?

To illustrate these points, I’ll use a switch graph from Synthesis Round 99 that had only one input and 2 states shown on the graph.


It is more easily understood: FMax, shown in blue, is simply the maximum obtainable fluorescence, regardless of how much reporter oligo is added.  Fold Change, shown in green, is a ratio of two reporter concentrations corresponding the the concentrations at which the fluorescence is half for FMax.

It is more easily measured: Fmax can be measured with just one experimental condition, essentially taking the medium value of the right-most column of dots.  Fold Change, on the other hand, requires measurements at a whole series of reporter concentrations, which are the 18 columns of red dots.  From these 18 conditions, a curve is predicted using a simplifying assumption (which the OpenTB data shows is of questionable apllicability) and using that predicted curve to get an Fold Change value.

It relates more closely to the constraints of a low-cost paper-based diagnostic:  The “output” of a paper-based diagnostic is one or more colored dots, where the color indicates how much reporter has bound to the RNA in the sample. The simplest paper based diagnostic could conceivably use just two dots -- one whose color depends on the reporter binding level and a fixed color (control) that the variable color is compared to.  If the single measurement condition is in the reporter saturation range, then the color can be very insensitive to reporter concentrations variations caused by manufacturing variation or “shelf aging” over time.  On the other hand, a measurement taken in the vicinity of a KD value will be at the point of maximum sensitivity to reporter variation, because that is where the binding curve is rising most rapidly.

It appears that there is a straightforward way to control it when designing: When Johan ran the first array experiments, he observed that not all the designs had the same FMax.  The obvious explanation for this was that the design “mis-folded” under reporter saturation conditions. This, in turn, was based on the desire to make binary switches that could be turned completely ON or completely OFF, depending on whether the reporter bound or didn’t bind.  So a switch that only partially bound the reporter, even under saturation conditions, was outside the scope of designs of interest.  Hence, he introduced the “folding score” component of the Eterna score, which penalized designs that had an FMax lower than 1.0.  In turn, I suspect most players (including me) tended to dismiss designs with low FMAX scores in the ON state as being “defective” in some way.

Nevertheless, some designs that had quite good fold changes did have low FMax scores.  In the design above, ON and OFF states have essentially the same FMax value, but it is much closer to .5 than to 1.0.  Furthermore, some designs had completely different FMax values for different concentrations, as seen here:


Not only has Oligo B shifted KD significantly, it has lowered FMax by about two thirds.

Why do I say it appears to be straightforward way to control FMax?  There are still plenty of details to be filled in -- which is why I am posting this here, to encourage others to help analyze the past data -- but here are things I am certain of:

  • Large scale adjustments to FMax can be achieved by having multiple reporter binding sites.  

  • Finer control (either increasing or decreasing) of each binding site can be achieved with a combination of varying the number of complementary bases at the reporter binding site and varying the degree the reporter binding is supporter by contiguous stacks on either end, ones that are created (or not) by the binding of an input oligo.  Used together, these can make up to at least a 50% change (either higher or lower) in the maximum luminance of a single reporter binding site.

I will follow this post up with more specifics for what past experiments say about controlling FMax, but not tonight.

Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Sensor A, B or C designs and Fmax


I have taken a look of the sensor designs in OpenTB round 2 and pulled the designs with a Fmax of close to 3 and above. I have added links to them below.

Out of habit I pulled a max fold change error at 1.25 to sort out questionable data. While I'm not looking at fold change.


Sum up of trends for now

  • The RIRI sensors with a high Fmax, tends to fold up like AndrewKae's hairpin stem. Just with the change that the switching hairpin can get real long. There are a lot of burrying of sequence deep down in the designs.
  • The RIRO sensors with a high Fmax, tends to be of the lane sharing kind. With competing and overlapping inputs. While there are a few designs that also shows burrying behaviour.
  • More Fmax are pinging out in the riro labs than the riris.


So just as with our switches so far behaves structurally differently depending on if they are ON or OFF switches, Fmax also gives different structural outcomes for ON and OFF switches. 



Sensor A riri

http://www.eternagame.org/game/solution/7113220/7161838/copyandview/


Sensor B riri

http://www.eternagame.org/game/solution/7113221/7127402/copyandview/
http://www.eternagame.org/game/solution/7113221/7127408/copyandview/
http://www.eternagame.org/game/solution/7113221/7127406/copyandview/
http://www.eternagame.org/game/solution/7113221/7241423/copyandview/


Sensor C riri

http://www.eternagame.org/game/solution/7113226/7241449/copyandview/
http://www.eternagame.org/game/solution/7113226/7245839/copyandview/


Sensor A riro

http://www.eternagame.org/game/solution/7113226/7245839/copyandview/
http://www.eternagame.org/game/solution/7113217/7120127/copyandview/
http://www.eternagame.org/game/solution/7113217/7250232/copyandview/
http://www.eternagame.org/game/solution/7113217/7116837/copyandview/
http://www.eternagame.org/game/solution/7113217/7120142/copyandview/
http://www.eternagame.org/game/solution/7113217/7176377/copyandview/
http://www.eternagame.org/game/solution/7113217/7116841/copyandview/
http://www.eternagame.org/game/solution/7113217/7143402/copyandview/
http://www.eternagame.org/game/solution/7113217/7188626/copyandview/

Sensor B riro

http://www.eternagame.org/game/solution/7113218/7239709/copyandview/
http://www.eternagame.org/game/solution/7113218/7241185/copyandview/
http://www.eternagame.org/game/solution/7113218/7123686/copyandview/
http://www.eternagame.org/game/solution/7113218/7241377/copyandview/
http://www.eternagame.org/game/solution/7113218/7123693/copyandview/
http://www.eternagame.org/game/solution/7113218/7123684/copyandview/
http://www.eternagame.org/game/solution/7113218/7239705/copyandview/
http://www.eternagame.org/game/solution/7113218/7123682/copyandview/
http://www.eternagame.org/game/solution/7113218/7241375/copyandview/
http://www.eternagame.org/game/solution/7113218/7241371/copyandview/
http://www.eternagame.org/game/solution/7113218/7241369/copyandview/
http://www.eternagame.org/game/solution/7113218/7241367/copyandview/
http://www.eternagame.org/game/solution/7113218/7123691/copyandview/
http://www.eternagame.org/game/solution/7113218/7241189/copyandview/
http://www.eternagame.org/game/solution/7113218/7120008/copyandview/
http://www.eternagame.org/game/solution/7113218/7120025/copyandview/

Sensor C riro

http://www.eternagame.org/game/solution/7113219/7133735/copyandview/
http://www.eternagame.org/game/solution/7113219/7116873/copyandview/
http://www.eternagame.org/game/solution/7113219/7241391/copyandview/
http://www.eternagame.org/game/solution/7113219/7143548/copyandview/
http://www.eternagame.org/game/solution/7113219/7126730/copyandview/
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Omei, thx for your note on subtraction rather than division being the more correct way to compare Fmax high and lows.

NB: The Fmax I have pulled out in the above post are by division which left a result of near 3 or above. Not subtraction. So the above designs are those with an extreme high Fmax.
(Edited)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 966 Posts
  • 304 Reply Likes
@Eli, I've changed my mind; I think you were right in using division for comparing FMax values.  The compelling argument, in my mind, is that the choice of how to scale the luminance values (i.e. what value to choose to be 1.0 on the scale) is really not well defined in the current experiments.  Comparing FMax values by division makes the choice of what is called 1.0 irrelevant, whereas subtraction does not.
Photo of whbob

whbob

  • 190 Posts
  • 57 Reply Likes
I like the idea of using fmax. It is the signal to listen to.  I had no luck in folding the design molecule in the AB/C2INC Round 2 lab.

In the present lab (Round 4), I started a new strategy. Minimize fmax when the TB-C oligo's are at 300nM concentration.  I needed to have TB-C mask the r' design sequence that attracts the reporter oligo. I got state 3 to do that.

State 3: 



The marked bases are the r' reporter attractor sequence.  The TB-C oligo's are covering the area where the reporter oligo would attach.

State 4:



A combination of the reporter and the TB-A & B oligo's push out the TB-C oligo's in state 4.

The problem is that when the TB-C oligo's come back in state 3, they can't push away TB-A & B. 

If you go to this design and press the "U" key, you will step through all of my designs for this strategy.  Perhaps this is not possible given the nature of the TB-A & B bases.

I'm hoping that others will try mutating this strategy.  I feel confident the TB-C oligo's will make for a very low (dark) reporter signal.  Although state 4 will produce a very bright signal with TB-A & B, it will be bright in state 1 also.  Not good :( 

Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Whbob, thx for sharing your designing strategy and the reasoning behind. 
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Why the INC2DEC designs did bad

  • Most of them don't get any switch score - there are little difference between KDON and KDOFF
  • Most of them don't get much baseline score - meaning the reporter has a hard time binding
  • The few designs that do get a switch score all have one thing in common. A strand pairing up with the reporter complement, when it needs to be turned off.
  • So making a switching hairpin out of the reporter complement and a nearby input complement may be of help.

We have been discussing in this forum post why the ABC2INC design has done bad in Round 2. I took yet a look at the data and noticed something I wish to share. I think Whbob may have said some of this before. So my apologies in advance if I repeat.


General data trends

There are no scores above 60%. So far, so bad. However when I look at what the score consists of, a few things stands out in particular. It is to a large degree the switch score that is missing.


Switch score

Johan explains the switch score like this:

The Switch score is based on the KD of the ON-states and the OFF-states.

In other words, the switch score is the difference in KD between the on and off states. So there isn't a huge difference between ON and OFF.


Baseline score

Also a lot of the designs dont get a good baseline score. This was also a problem for the round 1 and 3 ABC2INC designs.

Johan explains Baseline score like this:

The Baseline subscore is based on the KD of the ON-state. The score is 100 for for KD < 10 nM, 0 for KD >30 nM, and decreases linearly in between.

In other words, we get score for baseline if the ON state is at 10 nm KD and below and get part of the score up to 30 nanomolar. If the reporter need to be present at higher concentration than that, we get 0 baseline score.

(Johans document)


What characterizes the designs that get some switch score?

I pulled a sort by switch score

               

The designs with switch subscore above 0, all have a strand pairing up with the reporter complement, when the later needs to be turned off.

Here are an example from JR's design

               

The marked bases are the reporter complement.

So what may help this round for the ABC2INC lab is to use switching hairpins made up of the reporter complement and an input complement. As Andrew did in his RIRI sensor design. He is also doing it in a lot of his ABC2INC solves.
(Edited)
Photo of cynwulf28

cynwulf28

  • 80 Posts
  • 22 Reply Likes
*I'm posting this message (sic) here at Eli's suggestion*

I've been working on something that might help with the ABCINC labs (...and other labs if I did the same for them). I noticed in reviewing titles that most designs are just modifications of other designs...and this led me to want to find out two things: 

1) Which designs are original unique designs, and 
2) Which designs are being modded the most...too much/too little. 


from this data I hoped to be able to find out which approaches to designing are being used and which SHould be used. By isolating unique designs, and by grouping sets of designs with their mods, this can allow for better review of designs via wuami analysis as a run of only a few sequences from a mod set should give a representative view of that set as a whole. 
to assist with all of this I created a word Doc of all ABC-INC sequences from R4 (up to a week or two back) and (following extremely tedious formatting to isolate designs)I have annotated the designs in such a way as to group Mod-Sets together with the original design using similar formatting changes. 

Hopefully the link works: 
https://docs.google.com/document/d/15LKg7vFD8BrXTIYWrPf7WHA4_q59AaTaMS254lc6zg0/edit

I had also started to run sequences through both wuami charts as well as through the Lab itself. Sequences in Bold (mostly at the bottom of the page) indicate designs I ran through the wuami page. If the design meets both criteria for the Lab AND look good in the wuami chart they are in Arial Black font (nice and bold), if they work in the Lab, but they fail the wuami chart they are in Algerian font, and if the design failed to meet the citeria of the Lab in the first place I put a strike-through over it. I admit that the colors I used are not the best, but I was running out of unique formatting options. For instance your Monster Hairpin design and mods of it are highlighted in deep royal blue. 

As the hard lab puzzles for me see to be bugged, I am trying to contribute however I can. This is still a work in progress, but at the end of the day it should give a flowchart which shows: the history of design creation, which designs have/have not been modded and the extent to which designs have been modded, as well as exposing any potential player bias which may be misplaced in regards to which designs get the most attention and which designs deserve said attention.
Photo of Andrew Kaechele

Andrew Kaechele

  • 82 Posts
  • 22 Reply Likes
What is the convention for showing the original design vs. mods? Is the original last in the series?
Photo of cynwulf28

cynwulf28

  • 80 Posts
  • 22 Reply Likes
Indeed, that would be good to know! The entire list is arranged in the order in which the designs were submitted with the oldest designs at the bottom and the newest ones on top. As no dates are provided I admit that we only have a relative chronology and not a specific chronology. Also note that these designs are all from round 4 and so some designs which appear to be newly created this round may very well be a mod from a previous round. If time allowed I would do a full design Map for all 4 rounds...but I don't think that is likely to be done in time to help us with designing, though such a map might be useful with hindsight by revealing player patterns and bias.
Photo of Andrew Kaechele

Andrew Kaechele

  • 82 Posts
  • 22 Reply Likes
Thanks cynwulf28.