Analyzing the Ribosome Challenge pilot round results

  • 3
  • Article
  • Updated 1 month ago
The first experimental data for redesigning the ribosome is here! What can we learn from it?
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1008 Posts
  • 324 Reply Likes
  • excited

Posted 5 months ago

  • 3
Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes

Ribosomal Binding Sites in RNA violated in our designs


The e.coli ribosome consists of more than an RNA core made from 3 RNA chains - 5S, 16S and 23S. It is clamped together with a huge number of smaller proteins. 51 ribosomal proteins to be precise. 


28 ribosomal proteins binds to the large subunit (LSU), 23S

20 ribosomal proteins binds to the smaller subunit (SSU), 16S

6 ribosomal proteins binds to 5S. 


If you notice that the protein numbers when added, exceeds 51, it is because some of the ribosomal proteins that touches 23S also are binding to 5S. 



How do ribosomal proteins binding sites relate to our lab designs?


I have collected all the RNA motif and ribosomal binding site violations in one document. 


5S, 16 and 23S: Motif and ribosomal protein binding site violations


There is only one 23S design (2.17) that does not violate neither ribosomal binding sites and RNA motifs. 


Also I find Astromon's 2.21 23S design really curious. It is one of the designs with a lot of mutations. 43 mutations. Most of these huge mutation number designs do really bad. Yet it is low in both RNA motif and ribosomal site violations in comparison to the other designs with many mutations. (13 violations) And this design do really well.


Similarly the 16S design 2.11 with 21 mutations, but only 7 total violations does well too. 


This suggests that one can get away with a lot of mutations, if one is a little careful not violating too many motifs or ribosomal binding sites. Even though the ribosomal binding site violations are already in accordance with IUPAC.


On the 5S sequences, the design that has fewest RNA motif and protein binding site violations 2.05, Gerry Smiths does close to best. 



Our ribosome lab designs versus RNA motifs and ribosomal protein binding sites


5S:

2.01: 0 motif violations, 2 binding violations - total 2

2.02: 1 motif violation (Platform) 2 binding violations - total 3

2.03: 0 motif violations, 1 binding violation - total 1

2.04: 0 motif violations, 2 binding violations - total 2

2.05: 1 motif violation (Platform) 0 binding violation - total 1

2.06: 0 motif violation, 1 binding violation -total 1

2.07: 0 motif violations, 2 binding violations total 2

2.08: 0 motif violations, 2 binding violations total 2


16S

2.09 - 0 motif violations, 2 binding violations - does fair (2M)

2.10 - 0 motif violations, 3 binding violations - does fair (13M)

2.11 - 1 motif violations, 6 binding violations - does fair (21M) (Z-turn)

2.12 - 7 motif violations, 20 binding violations - does bad (55M) (2 A-minor) (5 G ribo)

2.13 - 2 motif violations, 8 binding violations - does OK (17M) (2 G-ribo)

2.14 - 2 motif violations, 4 binding violations - does bad (11M) (Platform, GA-minor)

2.15 - 2 motif violations, 2 binding violations - does fair (4M) (Same Z-turn motif) 

2.16 - 6 motif violations, 21 binding violations - does bad (55M) (A-minor, Z-turn, Loop E submotif, 2 G ribo)


23S

2.17 - 0 motif violations, 0 binding violations - does fair (1M)

2.18 - 0 motif violations, 2 binding violations - does bad (32M)

2.19 - 2 motif violations, 2 binding violations - does fair (15M)   (Platform/Bulged G, A minor)

2.20 - 17 motif violations, 13 binding violations - does bad (72M)  (Platform/GA minor, Loop E, GA minor, Platform, Bulged G, U-Turn, GA minor, U-Turn, Bulged-G/Platform, U-Turn, Platform)  *Two bases were changed in U-Turn, GA Minor, U-Turn*,A minor

2.21 - 5 motif violations, 8 binding violations - does fair (43M)  (UA handle, GA minor, Tandem GA)

2.22 - 0 motif violations, 2 binding violations - does OK (4M)

2.23 - 1 motif violations, 2 binding violations - does fair (7M)  (A minor non-WC pair)

2.24 - 5 motif violations, 11 binding violations - does bad (35M)  (A minor, Platform, U-Turn, Platform/Bulged G,T-Loop,Active site)


Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes

How many 5S solves without violations are there?


I wonder how many ways it is possible to solve the 5S puzzle in game so it is stable without violating any of the RNA motifs, ribosomal protein binding sites and IUPAC?


I made a solve in Vienna2 and ran the mutation booster on it to gather some more solves. 


Very conservative list of safe mods


Here are my Vienna2 attempts. There should be more solves when combining some of these bases for legal solves. Perhaps you can come up with some other ways. And how about the other engines? Is it possible solving there?


UGCCUGGCGGCCGUAGCGCGUUGGACCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGACGUAGCGCCGAUGGUAGUGUGGGGACUCCCCAUGCGAGAGUAGGGCACUGCCAGGCAU,,true

UGCCUGGCGGCCGUAGCGCGUUGGACCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGACGUAGCGCCGAUGGUAGUGUGGGGACUCCCCAUGCGAGAGUAGGGCACUGCCAGGCAU,G18,true

UGCCUGGCGGCCGUAGCACGUUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGACGUAGCGCCGAUGGUAGUGUGGGGACUCCCCAUGCGAGAGUAGGGCACUGCCAGGCAU,U25,true

UGCCUGGCGGCCGUAGCACGUUGGACCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGACGUAGCGCCGAUGGUAGUGUGGGGCCUCCCCAUGCGAGAGUAGGGCACUGCCAGGCAU,C87,true

UGCCUGGCGGCCGUAGCACGUUGGACCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGACGUAGCGCCGAUGGUAGUGUGGGGUCUCCCCAUGCGAGAGUAGGGCACUGCCAGGCAU,U87,true

UGCCUGGCGGCCGUAGCGCGUUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGACGUAGCGCCGAUGGUAGUGUGGGGACUCCCCAUGCGAGAGUAGGGCACUGCCAGGCAU,U25,true

UGCCUGGCGGCCGUAGCGCGUUGGACCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGACGUAGCGCCGAUGGUAGUGUGGGGCCUCCCCAUGCGAGAGUAGGGCACUGCCAGGCAU,C87,true

UGCCUGGCGGCCGUAGCGCGUUGGUCCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGACGUAGCGCCGAUGGUAGUGUGGGGCCUCCCCAUGCGAGAGUAGGGCACUGCCAGGCAU,U25+C87,true

UGCCUGGCGGCCGUAGCGCGGUGGACCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGACUCCCCAUGCGAGAGUAGGGCACUGCCAGGCAU,G21+C62,true

UGCCUGGCGGCCGUAGCGCGAUGGACCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGUCGUAGCGCCGAUGGUAGUGUGGGGACUCCCCAUGCGAGAGUAGGGCACUGCCAGGCAU,A21+U62,true

UGCCUGGCGGCCGUAGCGCGGUGGACCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGCCGUAGCGCCGAUGGUAGUGUGGGGCCUCCCCAUGCGAGAGUAGGGCACUGCCAGGCAU,G21+C62+C87,true

UGCCUGGCGGCCGUAGCGCGAUGGACCCACCUGACCCCAUGCCGAACUCAGAAGUGAAACGUCGUAGCGCCGAUGGUAGUGUGGGGCCUCCCCAUGCGAGAGUAGGGCACUGCCAGGCAU,A21+U62+C87,true

 





Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes

RNA motif and ribosomal binding site overview





16S Motifs and protein binding sites

23S & 5S Motifs and protein binding sites


DigitalEmbrace started out the 23S sheet with the locked bases and let me copy, Omei proposed how to list the RNA motifs, Gerry helped with finding the paired bases for 23S. Omei helped with some of the functions in the sheet, so have jandersonlee as I have imported a bunch of columns from the helix map sheet he started. Rhiju did the biggest job with identifying the RNA motifs and ribosomal binding sites for escherichia coli in the first place. 




Ribosome puzzles with fewer bases to mutate


By knowing where all bases that have a specific function, we have already reduced the task with mutating the ribosome considerable. Here is a list of the bases that are left after taking out all the bases with known functions: 


Number of bases in the original puzzles in parentesis.


Very conservative list of safe mods


5S          28 bases (120)

16S        330 (1534)

23S        713 (2904)



Super conservative list of safe mods


5S           23 bases

16S         251

23S         468



We already know that we can probably get away with violating some constraints. That not all RNA motifs are equally grumpy about getting changed. There was Jieux's design that modified a hairpin and violated IUPAC and still did well. Among the designs doing well are also some that violated some ribosomal protein binding sites. However touching too many of the bases with known associated function at once seem the sure way to trouble. 


So this list is not say that we can't ever mutate these bases, rather it is a helping hand so we are aware when we do, so we can do it with a purpose.  




Overviews of Rhiju's RNA motifs and ribosomal protein binding sites 



RNA motifs


https://github.com/ribokit/RiboDraw/blob/master/drawings/ribosome/16S/rna_motif/4ybb_16S.pdb.motifs.txt

https://github.com/ribokit/RiboDraw/blob/master/drawings/ribosome/23S_5S/rna_motif/4ybb_23S_5S_RPL.pdb.motifs.txt



Ribosomal protein binding sites


https://github.com/ribokit/RiboDraw/blob/master/drawings/ribosome/16S/rna_motif/4ybb_16S.pdb.ligands.txt

https://github.com/ribokit/RiboDraw/blob/master/drawings/ribosome/23S_5S/rna_motif/4ybb_23S_5S_RPL.pdb.ligands.txt


NB, everytime it says RA in any of these lists, it refers to a binding site in the 5S 

(Edited)
Photo of dl2007

dl2007

  • 11 Posts
  • 4 Reply Likes
What is MG in protein binding document ? Is it equally important as main protein ?
Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes
Hi dl2007!

MG stands for magnesium ions. They are positioned specific places too in the ribosome. However most of these spots are probably not as important to conserve as the ribosomal protein binding sites. I imagine that when we mutate, if we destroy a magnesium binding site, we may create another in the process. 

Magnesium ions helps with stabilizing the RNA. 

See under Biological Chemistry
https://en.wikipedia.org/wiki/Magnesium_in_biology

Here are a few videos about metal ions in relation to biology: 





(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes

Is it worse mutating in stems or single bases in ribosomal protein binding sites?


I have been asking around about the seriousness of mutating in ribosomal binding site bases versus mutating in RNA motifs. I sent my discussion with Omei to Andy and Antje. 


Eli: I assume it is more serious messing with the ribosomal protein binding sites than most motifs.


Omei: FWIW, I wonder about the high importance of protein site mutations, at least in helices. RNA-protein bindings are rarely hydrogen bonds, AFAIK.


Eli: I wonder if those mallable sites (according to IUPAC) in the Ribosomal protein binding sites are in stems then.


Omei: ... and in a helix, the specific bases aren't exposed. The only effect it could have with a protein is the indirect one that comes from the fact that although the helix shape is pretty much uniform, specific base pairings can make small tweaks.


Eli: Ok, so what I hear you say is that ribosomal protein binding sites perhaps is less sensitive to modifications, than RNA motifs.


Omei: I'll hedge on that last comment. I don't really know that the proteins can never get close to the bases via one of the grooves. That's my belief. But you could ask Andy or Antje to get better advice.


I have checked the 16S RNA ribosomal binding sites. 33 of the ribosomal protein binding sites that had alternative IUPAC options for a solve, were single base. 82 of them were in a stem. So there is an an overweight of changeable stem bases in ribosomal protein binding sites compared to single bases. I also recall seeing one protein "binding" up with a RNA base by having an MG ion in between them when I was viewing e. coli in Chimera.


Andy Watkins: yeah, that’s an interesting possibility, actually

you can get some specificities from the grooves, actually

a guanine’s Hoogsteen edge (basically, the “long face” of the base) pairs quite naturally with an arginine side chain; ditto an adenosine H edge with asparagine/glutamine


Eli: So it is basically the purine bases at the ribsomal binding sites that we may have to be careful about changing under very specific conditions. Argenine, asparagine/glutamine, being nearby in space.

Andy Watkins: those are definitely the biggest risk factors


For details on Hoogsteen bases see the bottom of the post. 





The ribosome folds up in stages


Antje Krüger: I‘m convinced that mutations at nucleotides in the vicinity of ribosomal proteins can affect their binding and thereby the folding, re-folding, assembly and stability of the ribosome. The wild-type RNA itself would not fold into its structure without them. The assembly itself is an ordered process involving RNAfolding and protein binding


Eli: I kind of imagine the ribosome as an RNA ball clamped together with Ribosomal proteins, so this makes good sense.So the ribosomal proteins themselves assist with the RNA folding?


Antje Krüger: Yes, they do. I imagine the proteins as shape keepers. There are also so called ‘assembly factors’ involved in ribosome biogenesis. These temporarily bind the ribosome and then e.g. chemically modify some of the nucleotides or break base pairings and form new ones.


Eli: I have read about chaperones as aides for folding RNA. I just hadn't thought about ribosomal proteins as such.


Antje Krüger: This is a recent article about the different stages of the large subunit when it gets reconstituted in vitro (please note that it is not iSAT what I do) without assembly factors.


I suggest to read the abstract, intro and discussion first and then dig into the specifics


Structural Visualization of the Formation and Activation of the 50S Ribosomal Subunit during In Vitro Reconstitution



Antje Krüger: and this is a recent review.


Structure and dynamics of bacterial ribosome biogenesis



While our ribosome designs are using ISAT - in vitro folding - so they may not fold up as the wildtype, I still think it is very interesting that the ribosome folds in steps. The paper has 5 folding states identified for the 23S. Perhaps we could mutate in specific of these states. So our mutations hits the folding area that should be timed to fold at the same go. 



How many ribosomal binding sites are single versus stem?


Ribosomal protein binding sites

5S: 7 bases in stem, 5 single bases

16S: 82 bases in stem, 33 single bases

23S: 155 bases in stem, 77 single bases



The pattern of most ribosomal proteins binding sites that are mallable according to IUPAC is touching stem bases, exist for the whole ribosome.





Hoogsteen bases



For an image of a Hoogsteen edge see Figure A 



RNA Base Pair Families


https://en.wikipedia.org/wiki/Non-canonical_base_pairing



I have started a datasheet where I check into these bases


Guanine’s Hoogsteen edge pairs arginine side chain or adenosine H edge with asparagine/glutamine


(Edited)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1008 Posts
  • 324 Reply Likes

Hypothesis: Base repeats of 5 or more are causing transcription failures in the ISAT experiment, resulting in wasted energy and reduced protein synthesis.

Although we are focussing (for good reasons) on ribosome folding, there is another possibility I think we should test for -- transcription efficiency in the ISAT experiment.

In the ISAT experiments, the ribosome is first transcribed from ribosomal DNA and then the fluorescent protein is made from messenger RNA. If there is anything that makes transcription less efficient than in vivo, the end result would show up as lowered total fluorescence, because the fluorescence is ultimately limited by the amount of chemical energy (supplied by pyruvate) that the experiment starts with.

In Eterna labs, we have seen in the past that repeated bases can interfere with the polymerization of complementary nucleic acid chains (which are highly analogous to transcription, but usually distinguished with separate terms). This was very much an issue in the early labs, where repeated bases interfered with the DNA -> DNA polymerization ("duplication") associated with amplifying the DNA. We've also seen it in the RNA->DNA polymerization ("reverse transcription") of poly(A) sequences in the SHAPE labs. It may well have occurred in the transcription process of all the DAS lab experiments, but effectively hidden, either by disallowing more than 4 consecutive Gs or Cs in lab puzzles, or by experimental protocols that compensate for the lower RNA production of sequences that include long stretches of As that we saw in the early riboswitch puzzles.

I looked at the WT 23S sequence and there are 12 segments with a base repeating 5 or more times - 7 poly(G), 1 poly(C) and 4 poly(A). Of these, all but two of the poly(A) segments contain at least one mutation that would break up the sequence without causing an IUPAC violation. I think it's worth using a synthesis slot or two to see whether simply breaking up these sequences has any effect on the experimental results.

I'll suggest the hashtag #transcription be part of the title or description of any submissions designed to test this hypothesis.

(Edited)
Photo of DigitalEmbrace

DigitalEmbrace

  • 73 Posts
  • 44 Reply Likes
Great idea. I'd been thinking about addressing base repeats with constraints but this is even better. The poly(G) and poly(C) are more likely to cause synthesis issues than the poly(A), or unknown?
(Edited)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1008 Posts
  • 324 Reply Likes
My guess is that this would be true. But that really is just my guess. After all, these repeated sequences must not be much of a problem in vivo, and I haven't come across anything in the scientific literature that directly addresses this question.
Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes
I can add that I killed the one C repeat in my 23S lab design. 

https://getsatisfaction.com/eternagame/topics/different_types_of_necks_and_their_effects_on_the_main_design





2.17: 9367747 (Eli Fisker)


Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes
It turned out there are good deal more designs that removed or even created more base repeats. Here comes a list. 


https://docs.google.com/spreadsheets/d/1bczkeBAabQK6_JEb_cRKDYA4Edt-pAUx-TVLe6JSu2M/edit?usp=sharing

Removing a base repeat didn't necessarily create total caos. Of the 4 cases with longer created repeats, it went ok in the 2 designs that had a low count of mutations.

Similarly removing a base repeat didn't necessarily meant design success. The 3 designs where removing a longer base repeat, that also went fairly well, were designs with fewer mutations. 

I dug up the data from 5S, 16S and 23S Helix Map

There were 2 sets of base repeats that had 2 and 3 designs with the same baserepeat removed at the same base. Base 670 and 736 for the 16S data. These designs didn't fare the same fate. There were one such data set for the 23S data, with a 5 base G repeat removed at base 188. These two fared bad. 
(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes
I can add that all these base repeats that were either prolonged or abbreviated, were done so in according to IUPAC. No violations. 
(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes

Ghost ribosomal proteins


I have realized that there are a lot more parts of the ribosome that we have not yet found out where touches the RNA part of the ribosome. 


Here is a part of the journey that lead me to realize. I was interested in watching the ribosome in action - with details of its movements. So I have been asking around. 


First a bit from the silly department: I found out that science papers generally show the ribosome with the large subunit on the top and the small subunit at bottom. Skull like. I wonder if it has any preferred orientation in space? I mean it has as soon as it meets a mRNA. :)




Different ribosomes, different proteins caught on the film


I asked Omei what I should search for in PDB if I wanted to see the motions of the ribosome during translation. 


Omei: In PDB site, I entered 'Structural characterization of mRNA-tRNA translocation intermediates' into the search and got a bunch of results. It looks like those that start with 4V6 all come from one paper.


I started to watch them. Instead of finding movements, I found something else. 


I found a ghost ribosomal protein that is not present in the 4YBB Rhiju based his overviews on. 

L31. It's kind of an elastic protein binding together the LSU with the SSU.


In 4V6O this spaghetti protein is curled up, while in the 4V6P it is more stretched out. 






4V60 (left) and 4v6P (right) and with 5S on top




L31, 5S with both tRNA's plus mRNA 



I find it particularly interesting that L31 is touching 5S. (Since I believe 5S is a switch.) 


Also there is a beautiful symmetry to this. There is a balance between the parts. One tRNA on each side of the L31 protein axis. Similar 5S is in between them the tRNA positions as well.



Bridges between the ribosome subunits


By the way, I found out that L31 was L31 by hovering over the protein in Chimera. That would give me the name of the chain. In this case B2 for both of them. (Heads up, the chain names changes a lot in chimera, so one always have to check the PDB entry.) Then I searched for B2 in one of the PDB entries: 


Finding a name on a protein: 



With the protein name in hand I could dig up papers. Here is the description I found: 


" In addition, we show that the failure to identify L31 in many ribosome preparations is probably due to the protein's loose association with the ribosome and its ability to form various intramolecular disulfide bonds, leading to L31 forms with distinct mobilities in gels."

https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1574-6968.1999.tb08816.x


So L31 is an intersubunit bridge. The only pure protein of such a kind. There is a whole series of them and I haven't yet caught them in my spreadsheet. 


So what makes L31 allow the stretch? Is that protein switching?


L31 doesn't look like any of the other proteins I have seen. Zero alpha helices. No beta sheets either.


Since this L31 sort of connects and holds together the LSU and SSU, what about connecting L31 to its two nearest ribosomal proteins - S14 and L5 - in one long protein? I'm aware that making longer ribosomal proteins, hurts the assembly time of the ribosome.


e.coli with "fused proteins", S14 to L31 to L5




Antje shared a paper with me: Transcription Increases the Cooperativity of Ribonucleoprotein Assembly. (Sorry, it is paywalled) 


Here is the main point I got from the paper. The ribosome start fold up with the ribosomal proteins even before it has finished being made. So everything is timed, binding of specific ribosomal proteins bind before others and are dependent on other ribosomal proteins. The paper helped find out what order some of the ribosomal proteins are binding. 


I was starting to consider that the ribosome may not fold up in the same timed states, that would make S15 and L5 be folded up in a good time so that they will make the L31 stick to the LSU and SSU and keep the ribosome together.


L31 looks kind of like shoulder straps keeping the ribosome's pants up. From what I understand it is crucial for protein making. The ribosome won't work or be really slow without. So my reasoning for wanting to attach its two nearest neighbour proteins was that the ribosome wouldn't fall apart. I have no idea either if it could then bind to the mRNA with the L31 tugged between S15 and L5. I was just thinking about the cool stuff Jewett lab has done with gluing the two ribosomal subunits together. I thought that perhaps the central protein holding the two subunits together, would need to be fused to a protein from each subunit, for keeping the ribosome assembled. Perhaps just one of these proteins. So what I imagined may not be possible, if it messes with the binding orders of proteins.


Antje Krüger: I do wonder all these things myself. If we understand the binding order of the proteins better, what actually drives it and how well do they attach. If we understand their function more, we may be able to make smart fusions, alter their structure and don't mess with the assembly itself. Some proteins are involved in locking the rRNA structure, others are thought to be involved in protein synthesis.



Videos of the ribosome in movement


Antje shared this fine ribosome movie with me: 




This made me realize that we didn’t have the binding sites of these assembly factors. 


I bugged Rhiju too and he mentioned: There is a good MRC video that was commissioned by Ramakrishnan.




This is related to the one Antje mentioned. It is in higher resolution, but it lacks the names. So each of them are great on their own.  


If you are curious about Ramakrishnan, here is a fine video introduction to him. It is funny too. 




Omei reminded me about the Noller’s ribosome videos. If you check the first one - avi (24 MB) - it is amazing just how much the 5S moves. 



Ghost proteins and more motifs to catch


To add in the Motifs and protein binding sites datasheet.



Intersubunit bridges 

L31


Missing SSU proteins

S1


Missing LSU proteins

L1,L7,L8,L10,L12,L26,L34


Assembly factors

IF1, IF2, IF3, RF2

EF-TU, EF-G


tRNA’s

Where they typically touch the rRNA

fmet-tRNA

tRNA-aa

fmet-tRNA-aa


mRNA’s

Where it typically touch the rRNA


Polypeptide

Where the growing amino acid chain typically touch the rRNA


Missing RNA motifs

Kinkturns


Column for modified bases


Messing with the modified bases in the ribosome can disrupt the function of the ribosome.


Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1008 Posts
  • 324 Reply Likes
I can only clock on Like once, so here's some extras. 
Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes

Highlighting RNA motifs with a script


A while back I had an idea for highlighting eg. all A-minor motifs in a design.


I used the script: Report/Mutate/Mark/Unmark Bases (v1.1) https://eternagame.org/web/script/9537916/


DigitalEmbrace volunteered to find the bases that violated motifs in the 23S lab designs.

I simply grabbed the 4 violations she identified in Dl2007's 2.24 puzzle (57, 189, 569, 1930) dumped them in the script and I can highlight them.


With 4 bases highlighted



What I imagine is that each of the RNA motifs can be grouped in a big chunk of comma separated bases numbers. So they can be highlighted either as single group or as all of them in one go. All it would take is to dump the pre prepared group of bases in the script to see the bases one want to leave alone. Or where specific RNA motifs live. 




Super conservative list of safe mods 


I have pulled together the bases in the ribosome, that we do not yet have a function like an RNA motif or a ribosomal protein binding site pinned to. It is an idea list of good places to start mutating. 


However when I wanted to demonstrate how we could highlight all these “safe” bases ingame, I ran into trouble. 


The new game markers are different. I can't mark the bases the way I did before. I also tried to remove all marks before running the first script. If some bases are highlighted already, I use this script to clear them. (Mark Mutations (v0.7) (Eli's copy) https://eternagame.org/web/script/9597819/


Anyway, I give you the base lists. The ability to highlight the bases in the lab puzzles I can’t give.


16S


Very conservative list of safe mods (330 bases)

45,48,66,74,75,76,78,79,80,81,82,83,85,86,87,88,89,90,91,93,95,96,98,100,103,121,122,123,124,126,134,138,139,140,141,143,150,152,154,155,156,157,158,163,164,165,166,167,168,169,179,181,183,190,199,200,201,207,208,209,210,211,212,216,217,218,219,220,222,223,224,225,226,232,238,241,248,250,252,256,257,268,269,274,278,279,285,289,293,295,304,307,311,316,317,320,333,336,337,371,379,384,396,423,425,434,435,440,441,442,443,444,445,453,454,456,457,458,459,460,461,462,463,469,470,471,472,473,474,475,476,477,478,479,480,484,489,490,491,492,497,513,576,591,592,593,594,601,602,610,614,626,631,632,638,639,646,647,648,649,650,660,661,662,665,670,673,679,681,682,709,711,722,733,736,738,743,744,745,746,747,748,752,755,760,761,762,771,776,784,798,808,811,812,819,822,833,837,838,839,840,841,842,843,844,845,846,847,848,849,851,852,853,854,863,896,903,904,965,987,988,999,1000,1001,1002,1006,1007,1008,1010,1011,1018,1019,1021,1022,1023,1036,1037,1038,1039,1040,1041,1076,1099,1121,1127,1129,1133,1134,1135,1136,1137,1138,1139,1140,1141,1145,1156,1163,1168,1173,1183,1189,1217,1218,1243,1244,1245,1246,1254,1257,1258,1260,1262,1263,1264,1265,1270,1271,1272,1273,1274,1275,1277,1279,1281,1283,1284,1285,1286,1292,1293,1294,1310,1327,1354,1355,1356,1362,1366,1367,1421,1422,1423,1424,1427,1428,1429,1430,1436,1439,1443,1444,1450,1451,1452,1453,1456,1462,1464,1470,1471,1472,1473,1474,1475,1476,1477,1478,1479,1508



Super conservative list of safe mods (251 bases)

45,48,74,75,76,77,78,79,82,85,87,90,91,92,93,95,96,100,122,124,126,134,138,139,140,143,154,155,156,157,158,163,164,165,166,167,168,183,199,200,201,206,207,208,209,210,211,212,213,216,217,218,219,220,223,224,225,226,241,248,250,256,257,268,269,285,289,293,304,311,316,317,320,333,336,337,396,425,434,435,440,441,442,443,444,445,453,456,457,458,459,460,461,462,463,469,470,471,472,473,474,475,476,477,479,489,490,491,492,497,513,591,592,593,594,601,602,614,626,631,632,638,639,646,647,648,649,650,660,661,662,670,679,681,682,709,711,736,738,743,744,745,746,747,748,761,762,771,784,798,808,837,838,839,841,843,844,845,847,848,849,852,854,896,903,904,965,999,1000,1001,1002,1006,1007,1008,1010,1011,1018,1019,1021,1022,1023,1038,1039,1040,1041,1076,1099,1121,1133,1134,1135,1136,1137,1138,1140,1141,1163,1168,1173,1183,1189,1243,1244,1245,1246,1257,1263,1264,1265,1270,1271,1272,1279,1281,1284,1286,1292,1293,1294,1310,1327,1355,1362,1367,1421,1422,1423,1424,1427,1428,1429,1430,1436,1439,1443,1444,1451,1452,1456,1462,1464,1470,1471,1472,1473,1474,1475,1476,1477,1478,1479




23S


Very conservative list of safe mods (689 bases)

1,2,4,9,10,11,12,20,21,22,26,34,40,41,43,50,51,56,60,62,67,68,72,78,79,81,82,92,94,101,103,104,105,107,108,109,113,114,116,121,131,132,133,134,135,136,137,138,140,141,142,144,145,146,147,148,149,150,151,152,153,154,155,156,157,160,163,169,170,171,172,173,174,175,176,177,179,183,184,185,186,192,203,208,212,217,218,219,228,229,232,236,239,240,246,261,263,264,267,269,270,273,284,285,286,287,295,301,302,303,305,312,313,314,315,316,326,331,341,342,343,356,367,368,369,370,375,376,377,382,387,392,393,394,403,405,406,412,421,425,435,436,438,439,440,469,474,475,486,488,490,491,504,509,517,520,522,524,540,541,542,543,544,549,550,551,552,554,577,594,595,596,602,610,612,613,618,623,640,641,642,646,647,648,664,665,666,669,677,680,681,696,697,700,702,703,707,709,710,711,712,717,719,720,721,722,735,737,738,741,742,749,754,755,756,758,765,766,774,777,785,787,800,816,822,828,838,840,841,842,843,844,845,846,847,848,853,854,855,857,866,875,876,877,879,880,883,884,885,886,887,888,892,893,894,896,897,898,899,901,902,904,914,916,924,925,931,932,933,934,935,936,937,938,940,947,949,950,961,963,964,972,978,979,980,982,984,985,994,996,1001,1002,1004,1008,1013,1014,1015,1016,1017,1018,1020,1026,1033,1041,1044,1045,1047,1051,1053,1065,1070,1078,1083,1093,1106,1108,1110,1114,1115,1116,1119,1144,1145,1146,1147,1148,1149,1150,1154,1159,1171,1191,1200,1201,1207,1208,1209,1210,1211,1216,1218,1219,1220,1221,1222,1227,1228,1229,1230,1231,1232,1233,1238,1239,1244,1269,1273,1280,1290,1291,1292,1293,1300,1303,1304,1306,1311,1316,1318,1319,1326,1332,1333,1336,1347,1348,1349,1355,1356,1375,1376,1382,1383,1385,1387,1396,1400,1402,1605,1606,1607,1622,1624,1625,1626,1629,1630,1633,1634,1636,1637,1639,1640,1659,1661,1679,1681,1682,1683,1684,1697,1704,1705,1706,1709,1710,1711,1712,1723,1746,1747,1748,1749,1751,1752,1757,1758,1760,1761,1762,1764,1765,1766,1771,1772,1793,1815,1822,1831,1845,1855,1856,1859,1860,1861,1862,1868,1870,1873,1880,1881,1882,1883,1886,1887,1896,1926,1927,1928,1934,1935,1937,1938,1940,1956,1957,1958,1974,1975,1979,1980,1982,1986,1987,1988,1989,1999,2001,2005,2006,2021,2026,2029,2035,2037,2042,2043,2047,2070,2083,2088,2089,2097,2098,2099,2100,2101,2102,2103,2104,2105,2106,2107,2108,2109,2113,2116,2121,2125,2137,2138,2139,2140,2141,2142,2146,2149,2150,2151,2152,2153,2154,2162,2163,2164,2165,2166,2177,2178,2181,2182,2183,2184,2185,2186,2187,2188,2189,2190,2191,2192,2194,2201,2205,2206,2207,2209,2210,2211,2213,2214,2215,2217,2218,2219,2220,2221,2223,2224,2228,2233,2236,2237,2240,2289,2290,2292,2297,2299,2300,2301,2317,2318,2319,2320,2321,2322,2338,2339,2340,2342,2345,2360,2363,2373,2380,2399,2402,2405,2413,2458,2461,2473,2489,2516,2525,2533,2534,2543,2568,2586,2625,2628,2629,2640,2643,2644,2645,2646,2649,2650,2651,2652,2659,2669,2670,2671,2678,2682,2691,2692,2699,2700,2712,2713,2734,2735,2736,2762,2763,2765,2768,2769,2770,2783,2789,2790,2791,2792,2793,2794,2795,2796,2797,2798,2799,2800,2801,2802,2803,2804,2805,2806,2807,2808,2812,2813,2818,2819,2820,2825,2827,2828,2833,2834,2841,2842,2843,2844,2852,2853,2854,2855,2856,2860,2861,2862,2863,2866,2871,2872,2877,2888,2891,2893,2895,2901,2902,2903


Super conservative list of safe mods (454 bases)

1,2,4,11,20,21,22,34,40,41,43,50,56,81,92,94,101,105,109,113,114,121,131,132,133,134,135,136,137,138,140,142,144,145,146,147,148,150,151,152,153,154,155,156,157,163,169,170,171,172,173,174,175,176,179,185,186,208,218,236,239,246,261,263,264,267,269,270,273,284,285,286,287,295,301,302,303,313,314,315,316,326,331,341,342,343,356,367,368,369,370,375,376,377,382,387,392,393,394,403,405,406,421,425,435,436,438,439,440,474,486,490,504,520,524,540,541,542,543,544,549,550,551,552,594,595,596,610,613,618,640,648,664,665,666,680,681,696,697,702,703,707,709,710,711,712,717,719,720,721,722,737,742,755,765,766,777,785,787,822,828,840,841,842,843,846,847,848,853,875,876,877,879,880,883,884,885,886,887,888,892,893,894,896,897,898,901,902,904,914,924,925,933,935,936,937,938,949,950,978,979,982,985,996,1004,1008,1013,1014,1015,1016,1017,1026,1033,1041,1044,1045,1051,1053,1070,1078,1106,1108,1114,1115,1119,1145,1146,1147,1148,1149,1150,1159,1171,1201,1207,1211,1216,1218,1219,1220,1221,1222,1227,1228,1229,1230,1231,1232,1233,1239,1244,1269,1273,1280,1290,1291,1292,1293,1304,1306,1311,1316,1318,1336,1347,1349,1355,1356,1375,1376,1382,1387,1400,1622,1624,1639,1659,1679,1683,1684,1704,1705,1710,1711,1712,1746,1747,1748,1751,1752,1757,1758,1761,1764,1765,1766,1831,1845,1860,1861,1862,1868,1873,1880,1881,1882,1896,1957,1974,1975,1982,1986,1987,1988,1989,2001,2042,2070,2083,2088,2089,2097,2098,2099,2100,2101,2102,2104,2106,2107,2108,2113,2138,2139,2140,2141,2142,2146,2149,2150,2151,2152,2153,2162,2163,2164,2165,2178,2181,2182,2183,2185,2187,2188,2189,2190,2191,2192,2194,2201,2205,2206,2209,2210,2215,2218,2219,2221,2228,2236,2240,2292,2299,2300,2301,2317,2338,2339,2340,2363,2373,2380,2399,2402,2413,2461,2473,2489,2525,2533,2534,2628,2640,2643,2646,2649,2650,2651,2652,2669,2670,2671,2678,2699,2700,2713,2735,2736,2762,2768,2769,2783,2791,2792,2793,2794,2795,2796,2797,2798,2799,2801,2802,2803,2804,2805,2806,2808,2812,2818,2819,2820,2827,2828,2833,2841,2842,2843,2844,2852,2853,2854,2855,2856,2861,2862,2863,2872,2877,2888,2895,2901,2902,2903



Here are the data sheets with the details for the individual bases in the ribosome. 


16S Motifs and protein binding sites

23S & 5S Motifs and protein binding sites




Still Missing Ribosome Partners



We still have around 1/5 of the actors interacting with the ribosome not yet in the spreadsheet. However the list has gotten shorter. Antje has helped me with data for some of them. 


Here are the yet missing ones. 


Intersubunit bridges 

L31


Missing ribosomal proteins 


S1, L1,L7,L8,L10,L12,L26


Assembly factors

IF1, IF2, IF3, RF2

EF-TU, EF-G


tRNA’s

Where they typically touch the rRNA

fmet-tRNA

tRNA-aa

fmet-tRNA-aa


mRNA’s

Where it typically touch the rRNA


Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes
It dawned on me that I could illustrate the safe mods list in the round 1 ribosome lab puzzles

Here is what I did:


1) Copy the base numbers out of the Very conservative list of safe mods column in the 5S Motif spreadsheet

,5,6,7,8,9,13,20,23,40,50,51,52,53,55,73,75,76,85,88,96,98,104,105,106,107,109,111,112,113,114,115,116,117,118,119,120

2) Open and reset the 5S lab puzzle so it is unmutated. 

3) Call the [Booster] Report/Mutate/Mark/Unmark Bases (v1.1) (Eli's copy) (0) 

 


4) Insert the base numbers in the booster





Very conservative list of safe mods (28 bases)






Super conservative list of safe mods (23 bases)



Now there are a lot fewer bases to concentrate on. This is what the data sheet potentially can do for the 16S and 23S puzzles too. 


Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes
MasterStormer has been really helpful with the base marker and I can now do exactly what want. Mark the list of conservative list of safe base mods. 

Here is how to do: 

1) Install the booster Report/Mutate/Mark/Unmark Bases (v1.1)


For how to install boosters see AndrewKae's Quick Start Guide to Using Scripted Tools in Eterna

2) Open one of the ribosome lab puzzles

3) Call the booster from the menu: 



4) You can find the safe list bases in this forum post or the datasheets themselves: 

Highlighting RNA motifs with a script

16S Motifs and protein binding sites

23S & 5S Motifs and protein binding sites


5) Dump the chosen set in the booster, here the superconservative list of safe mods for the 16S puzzle: 



6) Voila: Bases highlighted

Now you can see the potential sweet spots for modifying. If you changes any base, the markers will dissapear and you will have to call the script to do the highlight again. 



Now have some ribosome fun! :)

P.S. Masterstormer made me aware that the new marker also highlight the IUPAC bases in the same go. So we get a little extra base markers. 
(Edited)
Photo of emmarockit

emmarockit

  • 3 Posts
  • 8 Reply Likes

New experimental data for the pilot round’s 16S and 23S designs

 Almost four weeks into the first round of the ribosome challenge, I have a third set of iSAT data for the pilot round designs for you. This time, I challenged the designs to fold and assemble in iSAT under RNA folding stress conditions: low magnesium (Mg) and low temperature.


Aim of the ribosome challenge

Before I explain the results, let me first explain what we can learn from these data. The aim of the ribosome challenge is to design a stabilized ribosome which folds more easily and is much more stable than the wild-type ribosome (for more information, please look back to the previous blog post Andy Watkins and I wrote mid-November, https://eternagame.org/web/blog/9618257/). This means that an EteRNA design has to beat the wild-type ribosome under standard iSAT conditions, which have been carefully optimized to maximize ribosome folding and consequent protein production. In principle this should be possible, because the wild-type ribosomal RNAs when synthesized in iSAT does not fold, assemble, and perform ideally – evolution has tested and selected ribosomal RNA sequences in living cells and not in the test tube! Hence, if life is robust, the ribosome ought to be evolvable, and thus optimizable, for this specific new environment. And therefore, it should be possible for EteRNA players to come up with ribosomal RNA variants that more easily and stably fold and assemble into better performing ribosomes in the test tube than the wild-type does.

 

What can we learn from testing EteRNA ribosome designs under folding stress conditions?

 

In my previous iSAT experiments, I compared the designs to wild-type ribosomes under standard iSAT conditions with optimal Mg and temperature regimes. In order to get more insights into the design’s folding behavior, I tested the designs in iSAT under “no PEG and no extra DTT” conditions. As a reminder, DTT (dithiothreitol) is an antioxidant (prevents oxidative damage) which helps ribosomal subunit synthesis and assembly. PEG is a crowding agent creating a more cell-like environment which might counteract that assembled ribosomes fall apart. Thus, omitting PEG and DTT placed the ribosomes under a bit of extra stress, where conditions were not optimal for folding and function.

 

 

How do magnesium and temperature affect RNA folding?

 

In order to get more insights into the folding behavior of the pilot round’s 16S and 23S designs, I tested the designs in iSAT in low Mg and low temperature regimes. Mg generally helps RNA folding through two mechanisms. First, there is a general electrostatic effect, since the RNA backbone is a polyanion (multiple negative charges) and Mg is positively charged. Second, there can be specific interaction geometries that are particularly favorable when the Mg includes phosphate oxygens in its octahedral coordinate geometry. (Some of these terms might be unfamiliar, and that’s okay. Basically, Mg tends to coordinate with six electronegative atoms, especially oxygen or nitrogen, and the best geometry for that is “octahedral” – like three perpendicular x,y,z coordinate axes.) One classic example of an RNA structure that is stabilized by a specific Mg binding interaction is the HCV IRES Domain IIa (PDB ID: 2PN3). So, lowering the Mg concentration asks a design to fold into the target structure with less “help” from Mg.

 

Using a lower temperature, in contrast, has multiple effects. First, it slows down RNA synthesis, which should give the newly synthesized RNA more time to fold. Specifically, folding may be more local and co-transcriptional – that is, if the first 20 nucleotides (A, U, C, G) of an RNA can form a structure, and the first 30 nucleotides of an RNA can form a more stable structure, then you might be slightly more likely to end up with the first, less stable structure because synthesis will be slower relative to folding. This is a simplified picture, of course – I’m speaking in absolute terms rather than probabilities – but I hope it’s illustrative. Relatedly, since lower temperature means there is less thermal energy available, RNA folds have a harder time resampling themselves to “fix” suboptimal structures. If you have a steep energy landscape around a mis-fold, where, say, any set of three base pairing changes are very destabilizing, but there is a set of six base pairing changes that is very stabilizing, then the RNA will have an easy time finding those six changes at high temperature and a very hard time at low temperature.

 

Of course, in iSAT, both Mg and temperature changes also affect the extract’s metabolism needed for RNA synthesis and folding, ribosome assembling and protein production, but this “background influence” should be the same, independent from the tested designs. So, we really only have to think about the influence of Mg and temperature on rRNA folding, in particular the thermodynamic and kinetic effects above. Therefore, the performance of a design under these challenging conditions provides additional information about a design’s folding success and stability.  

 

Experimental data for the 16S designs 

Now, the data: First, the results for the 16S designs. On the top you can see how the designs performed compared to the wild-type under standard iSAT conditions with optimal temperature and Mg (37 °C and 10 mM Mg) in this experiment. Similar to my previous data presentations, on the left you find the GFP production over time and in the middle and on the right the final amount of GFP made (maxGFP). The second row shows how the designs performed at optimal temperature but only 5 mM Mg, and the last row when in addition to the low Mg regime also the reaction temperature was reduced to 30 °C. All data are normalized to the maxGFP of the wild-type under optimal temperature and Mg regime. In addition, I adjusted the y-axis of the plots individually and did not keep it the same for all.



 

As you can see, lowering the Mg concentration to 5 mM reduced GFP production by the wild-type and all designs – some were more affected than others. Interestingly, lowering also the reaction temperature to 30 °C was a little beneficial, and also here, some designs were more affected than others.

You probably noticed that in this experiment the designs behaved a bit worse compared to the wild-type and that there is no detector saturation indicated anymore. The main difference to before is that in this new set of experiments I used a lower reaction volume and a different machine for GFP fluorescence detection. This might have affected the designs performance compared to the wild-type. Since I don’t know the reason for the discrepancy yet, and don’t have more data, I suggest considering the old and the new data as valid.


Experimental data for the 23S designs

Now the 23S designs: All data are presented as for the 16S designs. Also here, the stress conditions influence some design more than others, and the results for the standard condition vary a bit compared to the previous experiment.



 

Coming next, I will analyze the time course data on the left a bit further. These additional might provide more insights into the effects of the stress conditions. So, you can expect additional data soon.

  

Cheers,

Antje (Antje Krüger from the Jewett lab at Northwestern University)

 
(Edited)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1008 Posts
  • 324 Reply Likes
I'm very much looking forward to hearing your thoughts on the implications of the time course variations, Antje.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1008 Posts
  • 324 Reply Likes
The first thing that stands out for me is that two of our 23S designs (Eli's 2.17 and Gerry's 2.22) clearly did better than the WT under normal temperature but low Mg conditions. These are both low mutation designs (1 and 4, respectively) that have consistently done as well or better than WT, but the improvement was small enough under standard conditions that the differences could be due solely to experimental error. But under this stressed condition, they clearly do better than the WT.

What's more, they have something obvious in common, that is breaking up a string of 5 or more of the same base. Eli's design had only one mutation, with the explicit intent to break up a CCCCC sequence. Gerry's design had 4 mutations in all (2 pairs), one of which broke up a GGGGG sequence. Eli's design does somewhat better, but it isn't clear whether that implies breaking up a CCCCC sequence is better than a GGGGG one, or that  Gerry's other mutations had some offsetting disadvantage.

Earlier in this discussion, I proposed a hypothesis that repeated sequence could create a challenge during transcription of the ribosomal RNA, possibly causing a partially transcribed rRNA to "fall off" the template DNA before it was fully transcribed. I know that Mg is required for transcription, so it makes sense that a lower Mg concentration would exacerbate that effect, and reducing the occurrences of 5+ identical bases would mitigate the problem.

I had already submitted a 23S sequence that breaks up all the 5+ sequences that aren't mandated by the conservation constraints. But there are other possible tests, and maybe yours will look more appealing than mine when the time comes for voting. 
Photo of Astromon

Astromon

  • 199 Posts
  • 29 Reply Likes
great insight! id like to say these two designs had very low mutation counts.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1008 Posts
  • 324 Reply Likes
Agreed. Designs that can test a significant hypothesis with just a few mutations can be very valuable because the results are easy to interpret.
Photo of Gerry Smith

Gerry Smith

  • 86 Posts
  • 50 Reply Likes
For current  23S lab, I combined and submitted the two mutations that interrupted the 4 GC strings that Eli and I had in our designs (and did not include the additional mutations I had made in my design).  As I stated in my Slack discussion of my design, I selected to interrupt that 4 GC sequence because of Omei's and Eli's online discussion of that criteria.  This is good evidence of the value of sharing hypotheses discussion more broadly so that others can try using them too. 
Photo of Gerry Smith

Gerry Smith

  • 86 Posts
  • 50 Reply Likes
I will also submit a design that includes my other mutations in hopes of seeing whether these hurt the overall design.  The other two mutations were changing two GU pairs to AU pairs.  Perhaps that could tell us if those 2 GU pairs are functionally important.
Photo of DigitalEmbrace

DigitalEmbrace

  • 73 Posts
  • 44 Reply Likes
I voted for Gerry's 2G/C interrupt design (and Omei's design). I think the most focused test holds the most value. I didn't realize my D6 design is the 5G interrupt from 2.22; I don't see much usefulness in testing D6.

I have a couple other designs that interrupt long repeats; they are marked with #transcription. Onion and Onion 2 have that 2.22 interrupt plus mutations in the outer layers of 23S because there is some evidence mutations further away from the center are less likely to disrupt the PCT.
Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes
I have looked through all the 23S 5+ repeats and identified a few base pairs looks particular promising, because they are not touched by ribosomal proteins, RNA motifs or are fixed in IUPAC. It is 406-421, 897-880 and 883-893. 

I have already put up a few designs with a different mution in the repeat C's I mutated in the pilot round, so I can not use the above combos. I will rather see them in use, so they are up for grabs. I would like to see both single base and full basepair mutation as far as IUPAC allows. 

They could be labeled with hashtags such as #sequencevariation #GGGGG #transcription 

Photo of DigitalEmbrace

DigitalEmbrace

  • 73 Posts
  • 44 Reply Likes
The last two pairs are addressed in your (Eli) ReIA designs. Two of the pairs are also addressed in my onion designs, although you may prefer a more pure test.
Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes
Thx DigitalEmbrace! I slipped up. You suspect right. I'm happy the pairs are in your onion designs, but I will still like to see them tested alone. 
(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes

Speed Evolution by breaking IUPAC


I think that we should not take the IUPAC violations all too serious. I think we can get around breaking some of them. Here is why. 


Last november Andy and Antje explained how evolution works. Here an excerpt:


"We like to talk about how sequence covariation can be used to support particular secondary structures. For example, if bases 1 and 10 both vary a lot, but they are always complementary to each other, we have reason to suspect that they are base paired to each other. But this sort of coordinated change isn’t how evolution actually samples sequences over time. Mutation rates in E. coli, depending on who you ask, are between one and two mutations per 10,000 generations per genome, and any particular nucleotide will mutate once per two hundred million generations. That means that the chance that a particular nucleotide, as well as its base pairing partner, will mutate in the same generation are extremely low. Since there have been many individual E. coli, it has surely happened before, but on average probably about once per E. coli lineage. Really, then, the way that covariation works is that a single nucleotide mutates in a way that does not totally break the fold, and then later, its base pairing partner makes a compensatory mutation. Evolution can’t sample big, structural changes; it traces a gradual path of changes that aren’t fatal."

From this post: https://eternagame.org/web/blog/9618257/


All mutations in single base areas have been tested plenty by evolution but not so much the base pairings. 


However changing a base pair radically, takes either a double mutation, which is extremely unlikely to happen in one go. So base pairs will tend to change in several steps. A base pair may change like this. A GC may get one mutation that become a GU. This GU may later become an AU. 


However what has not thoroughly been tested by evolution is a GC becoming a CG or UA in one go. Basically we can get more bang for our mutational buck, if we specifically target every base pair all over the ribosome that is not involved in RNA motifs or is touched by ribosomal proteins and double mutate it.


The further away from the original base pair, potentially the better, as for what is most likely to not have been tested. Which means flips. 


Normally when a single point mutation happens, a G becomes an A (keeping it in the purine family) or a C becomes a U (keeping it in the pyrimidine family). We want to stray as far from normal as possible, to test what evolution has not been testing. So this means we purposefully make G into C's, C's into G's, A's into U's, U's into A's.


This kind of change is also likely to cause some structural changes. So not every such double mutant is likely to be for the better.



Hypothesis: Evolution by base-pair substitution


What I wish to particularly highlight from Andy and Antje’s post: 


"Mutation rates in E. coli, depending on who you ask, are between one and two mutations per 10,000 generations per genome, and any particular nucleotide will mutate once per two hundred million generations. That means that the chance that a particular nucleotide, as well as its base pairing partner, will mutate in the same generation are extremely low."


My hypothesis is that by systematically double mutating any base pair that isn’t involved in critical structure, we will exponentially increases our chance of testing something that evolution has not yet gotten around to.  



Join the lab experiment


Feel free to join the experiment. 


  • Proposed hashtags for the design title: 


#Substitution #IUPAC violation


What to do:


  • Pick a base pair that does not contain an RNA motif

  • Or is touched by a ribosomal protein. 


Spreadsheets with positions of RNA motifs etc. 


16S Motifs and protein binding sites

23S & 5S Motifs and protein binding sites


  • Make a double mutation to a base pair. It is no problem if it violates IUPAC. The more, the merrier. 


Photo of DigitalEmbrace

DigitalEmbrace

  • 73 Posts
  • 44 Reply Likes
Seems reasonable that if a player wants to change a pair in a stem for a specific reason, then that would be a low risk IUPAC violation.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1008 Posts
  • 324 Reply Likes
I certainly agree that if there is a solid reason for making mutations that violate the conservation constraints, players should feel free to do that.

On the other hand, I think it's a mis-interpretation of Andy and Antje's post to think that double mutations in pairs have never sampled in nature, and therefore we should purposely test them out as being novel.


The ribosome is filled with conserved pairs that evolution couldn't find under that interpretation. Looking at the 23S base pair 43/436, only the strong-strong pairs have survived (in the far-from-complete database of sequences we found) in gammaproteobacteria, despite the fact that it takes two mutations to convert from one to the other. Less obviously, the 41/438 pair has conserved only 4 of the 16  standard pairings, but there is no way to evolve from one of those to another with only one mutation.

Nature clearly finds a way to navigate through what appears to be a double mutation. The conservation data just suggests that the intermediate forms didn't work out well in the long run.
(Edited)
Photo of Brourd

Brourd

  • 461 Posts
  • 84 Reply Likes
@Omei it's possible that the structure of the base pair with the bulge occurs in a dynamic equilibrium. If the bulge is C, the residue at 436 can be a G, while still maintaining a ~similar-ish structure (there will probably be a slight change to the twist of the helix), and then the rescue mutation at 43 can be a C. It's still a 2 mutation path which could explain what has happened.

As for 41-438, we have not explored the double mutation correlation, but it is possible given it is in the middle of the stem, that it can handle a single non-canonical base pair while only causing moderate strain on the organism. This allows unconventional pathways to be sampled by evolution, but are not as likely to occur.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1008 Posts
  • 324 Reply Likes
Agreed. And I'm confident there are other ways as well. Nature has superpowers that players don't have, namely insertions and deletions.
Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes
Ok. I see I missed that nature can swap basepairs in other manners, also indirectly.  Astromon did notice that a design he did of mine type, had a higher delta. The ones I did later also tends to be in the real high range. While I do not have a lot of faith in the accuracy of the deltas of our engines, it may very well be that the nature has searched out the beneficial energies already. 

Photo of Brourd

Brourd

  • 461 Posts
  • 84 Reply Likes
@Omei While it's not my area of expertise, I believe insertions and deletions occur at lower levels compared to mutations (SNPs) in the DNA. Insertions and deletions are generally more detrimental to an organism, given a lot of SNPs result in degenerate amino acids for coding and not frame shifts. In the case of rRNA, it's probably not as bad in some instances, but definitely having extra nucleotides in the sequence requires that they are sequestered in some way, or there is a strong possibility the equivalent of a frameshift could occur in the secondary structure.
Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes

Neck mutations: Far range stabilization


I have an idea I think we can use in combination with the base-pair substitution idea above. 


Back in the static RNA labs I was testing two almost identical designs, with just one difference. Only difference was a flipped AU base pair in the neck. 


A neck is the outermost stem that is folded together of two far away strands. For further explanation see Neck definition



Both designs were winners but had different score. But they were within error according to Rhiju.


However the reason why I'm particularly interested in it in related to substitution of basepairs is the following.


Watch the blue in the stems between the two designs. The blue means that this area is stable - not accessible to chemical probes. (The data is shown in full blue and full yellow and not showing nuances. For reading SHAPE data see this post Intro to SHAPE data)


Concrete example of base-pair substitution



Are the blue area the same places in both designs? Eg watch the stem with the triloop. 


What I think is interesting is that it is not the same stem areas that are deemed stable in both designs. Despite there being only one base pair flip in the neck that is different between the two designs.


So by flipping a pair in the neck one can change the stability in an area far away


I think this is yet another way to gain extra mutational power, for just one base-pair substitution. We can potentially have long range effects and change stability far away even without touching this area directly.


Background post: Different types of necks and their effects on the main design


If you are up for testing this too in lab, just add #neck in your lab design title. 



Photo of Gerry Smith

Gerry Smith

  • 86 Posts
  • 50 Reply Likes
Here is spreadsheet with the current 105 16S designs.  About half have 7 or less mutations.

https://docs.google.com/spreadsheets/d/1iOVRzLREaJX8Sxe5Qc6-nlUUsF2qWbpZIN2oF-2sjHo/edit?usp=sharing
Photo of Eli Fisker

Eli Fisker

  • 2289 Posts
  • 518 Reply Likes
Gerry, I see you have found a way to show us how many mutations there are in the designs. Thx!
Photo of dl2007

dl2007

  • 11 Posts
  • 4 Reply Likes
I am interested, what is maximal tolerable number of motif and protein binding sites violations? For example, Astromon's 23S 43 mutations design had 13 violations and worked well and Gerry's 32 mutations design had only 2 violations but worked not so good. I think that maximal tolerable number  for such violations can be 10-15 total per design, including 2-4 motif violations and 1-2 violations per protein.
Probably, sometimes, it can be beneficial to have several violations if one wants to test some particular  hypothesis like cleared arcplot in my case.
Photo of Brourd

Brourd

  • 461 Posts
  • 84 Reply Likes
@dl2007

To describe it from a purely evolutionary standpoint, violations are just mutations that do not occur with greater than some threshold % frequency out of the entire population of gammaproteobacteria. This doesn't mean they are bad, but it doesn't necessarily mean they are good either. Protein binding sites are usually recognition of the base pairs in the helix (polarity may or may not matter), and RNA motifs are folding into a specific conformation both due to sequence identity as well as steric constraints on the system. Making mutations to either of these without prior knowledge of what the bases are doing is not the greatest way to go about doing things, but it's also perfectly fine in some circumstances. There are several stems where individual mutations would most likely disrupt the structure of the RNA, but are not a part of any major motifs or protein binding site, and therefore mutating the base pairs is "probably" fine. These would be constraint violations, but their identity may not ultimately matter.
(Edited)
Photo of Astromon

Astromon

  • 199 Posts
  • 29 Reply Likes
Or those 2 violations were more critical than any of my 13.
Photo of Brourd

Brourd

  • 461 Posts
  • 84 Reply Likes
Gerry's design also mutates a number of base pairs in the structure. So that could also be why his design did not do as well~
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 1008 Posts
  • 324 Reply Likes
Thank you Gerry! I like to add a column with links to individual designs. So https://docs.google.com/spreadsheets/d/1m5wRbUFSr4izZ9ZZeb6cI5fnIimYpifcEU1Krnl_B_U/edit#gid=0 is copy of your spreadsheet, with links added.
Photo of Astromon

Astromon

  • 199 Posts
  • 29 Reply Likes
my design also had a number of mutated base pairs.  if he had a greater number that might mean something, or maybe not, maybe the base pair mutations he made were the bad ones not necessarily that they were base pair mutations. 
Photo of dl2007

dl2007

  • 11 Posts
  • 4 Reply Likes
Anyway, whatever the reasons for Gerry's not so good result, my point is that good working design can have reasonable number of motif and protein sites violations (10-15) and we should not neglect such designs during voting.
(Edited)
Photo of Brourd

Brourd

  • 461 Posts
  • 84 Reply Likes
@Astromon - For clarification, Gerry's sequence mutated a number of base pairs, effectively making them mismatches. Unless your design with 5 votes on the 23S pilot round is not the one that was tested, you did not do that in your sequence. I would say that is probably one of the primary reasons that Gerry's design did not do well compared to your own.
Photo of dl2007

dl2007

  • 11 Posts
  • 4 Reply Likes
@Brourd - Maybe it is evidence that maintaining secondary structure can be important  and it will be interesting to test design with good arcplot within soft constraints boarders. IMHO.
Photo of DigitalEmbrace

DigitalEmbrace

  • 73 Posts
  • 44 Reply Likes
Indeed, the sequences of other ribosomes are quite different than e. coli and have motifs forming at different sites. As for protein binding sites, I have no idea how vital each one is. Has anyone analyzed protein sites in our Pilot designs?

I consider the spreadsheet an analysis tool, not another set of constraints.
Photo of dl2007

dl2007

  • 11 Posts
  • 4 Reply Likes
@DigitalEmbrace I have noticed that in Astromon's 43 muts design protein binding sites violations were 1-2 per one protein.
Photo of dl2007

dl2007

  • 11 Posts
  • 4 Reply Likes
Continuing my thought about secondary structure. Astromon's design secondary structure  was stabilized only partially but it was enough to compensate for 43 mutations, probably design with similar number of mutations but significantly more stabilized (cleared arcplot) can work significantly better. I think it is worth to check it this round.
Photo of DigitalEmbrace

DigitalEmbrace

  • 73 Posts
  • 44 Reply Likes
I just noticed Eli's post at top of page 2 analyzed protein binding sites.

I agree that stabilizing the secondary structure looks promising.