Ribosome Pilot Challenge

  • 3
  • Idea
  • Updated 1 day ago
This is a conversation to discuss all aspects of the Ribosome Pilot Challenge, just announced at https://eternagame.org/web/news/9221020/.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 960 Posts
  • 302 Reply Likes
  • excited

Posted 2 months ago

  • 3
Photo of Eli Fisker

Eli Fisker

  • 2214 Posts
  • 477 Reply Likes
Turn DNA sequence into RNA or invert a sequence


A while back Cynwulf shared the fine TextMechanic tool with me. I was playing with making switch puzzles from natural sequences and was in need of turning some DNA sequences into RNA.

The sequences in science databases are also regularly inverted compared to what we are used to see in EteRNA. 

Point is that it is helpful having tools that can invert sequence or turn DNA into RNA. However lately TextMechanics have gone from being a free tool to become a 4 free runs an hour tool. Which is fine but didn't cut it for what I want. 

So I have made two small scripts that can do what I most want. Hope they may be of help to some of you too. 

Reverse Sequence Tool
Search and replace character Tool







Photo of Eli Fisker

Eli Fisker

  • 2214 Posts
  • 477 Reply Likes
Turn DNA sequence into RNA or invert a sequence


A while back Cynwulf shared the fine TextMechanic tool with me. I was playing with making switch puzzles from natural sequences and was in need of turning some DNA sequences into RNA.

The sequences in science databases are also regularly inverted compared to what we are used to see in EteRNA. 

Point is that it is helpful having tools that can invert sequence or turn DNA into RNA. However lately TextMechanics have gone from being a free tool to become a 4 free runs an hour tool. Which is fine but didn't cut it for what I want. 

So I have made two small scripts that can do what I most want. Hope they may be of help to some of you too. 

Reverse Sequence Tool
Search and replace character Tool







Photo of worseize

worseize

  • 29 Posts
  • 12 Reply Likes
Fatal error
Photo of Eli Fisker

Eli Fisker

  • 2214 Posts
  • 477 Reply Likes
New forum post on how we can simplify the ribosome task and use a booster to tell us about which bases (unlocked bases) we should be careful with mutating. 

Omei has kindly shared the sum up he did of what bases are usual for all the e. coli relatives for the 5s puzzle. 

Image by Omei


For details on the secret code ;) see: 

Using multiple criteria for evaluating mutations


Photo of spvincent

spvincent

  • 48 Posts
  • 8 Reply Likes
The 16s rRNA contains highly conserved regions (across species) and also variable regions. Perhaps it doesn't matter given the overall goals of the project but would it make sense to lock the conserved regions? Right now the locking seems to be based on tertiary bonding.
Photo of jandersonlee

jandersonlee

  • 542 Posts
  • 119 Reply Likes
We're working on a booster that indicates the species variations as soft constraints using colored markers to indicate known variations, black for mods that match the inter-species variants, and white for mods that don't. So far it has data for 5S and 23S. Hopefully 16S will be added soon. https://eternagame.org/web/script/9311310/
Photo of jandersonlee

jandersonlee

  • 542 Posts
  • 119 Reply Likes
Added an initial soft constraint for 16S part 2 just now. Still working on parts 1 and 3 and a better part 2.
Photo of Eli Fisker

Eli Fisker

  • 2214 Posts
  • 477 Reply Likes

Strategy for stabilizing the ribosome


The ribosome has a dual nature. In localized areas it is a switch, but it is a static RNA at other places.  


For a while I have focused mainly on what I consider switching areas. Now I have decided to treat the ribosome as if it was just another static RNA puzzle. It is of cause a bit bigger. ;) No need to worry though if you haven’t worked with static RNA labs earlier.


Here are what static RNA labs like.


  • Static RNA loves having structure variation.


Every element like stem, endloop, multiloop, bulge if anywhere near each other tends to be different in size. - Luckily the ribosome has already taken care of this for us beforehand. The structure is fixed. See - nothing to worry about - just ignore! :)


  • Static RNA loves having sequence variation.


The more varied sequence, the better. Luckily the ribosome has also taken mostly care of this for us already.  


However the larger the puzzle, the bigger the chance that some sequence are in repeat or that a strong sequence have multiple potential partners. The more repeat sequences, the less stability.


This is where we can give e.coli a helping hand. While the e.coli is working already from the hand of nature, we can aid it becoming more stable, by targeting repetitive sequence that potentially make the ribosome a bit unstable in some areas.



Join the #sequencevaration experiment


I will stick the hashtag #sequencevariation on my designs. You are welcome to join. The more, the merrier. Let’s give our e.coli ribosome bug a bit more backbone. A stabler one at least. :)



Script help from jandersonlee


To make the task much easier, jandersonlee has brewed up two great scripts. (Plus many more)


Thx to Jeff for the fine scripts.



Finding the repeat copies and partners

Long repeat finder 1.13


This scripts can identify sequences of 6 bases and above that are in more than one copy - match sequences. The script also identifies the potential partner sequences for each of these match sequences.


  1. I open the E. Coli Ribosome 23S rRNA domain III puzzle, copy the original sequence.


GAUAAAGCGGGUGAAAAGCCCGCUCGCCGGAAGACCAAGGGUUCCUGUCCAACGUUAAUCGGGGCAGGGUGAGUCGACCCCUAAGGCGAGGCCGAAAGGCGUAGUCGAUGGGAAACAGGUUAAUAUUCCUGUACUUGGUGUUACUGCGAAGGGGGGACGGAGAAGGCUAUGUUGGCCGGGCGACGGUUGUCCCGGUUUAAGCGUGUAGGCUGGUUUUCCAGGCAAAUCCGGAAAAUCAAGGCUGAGGCGUGAUGACGAGGCACUACGGUGCUGAAGCAACAAAUGCCCUGCUUCCAGGAAAAGCCUCUAAGCAUCAGGUAACAUCAAAUCGUACCCCAAACCGACACAGGUGGUCAGGUAGAGAAUACCAAGGCGCU


2) Then I open the Long repeat finder script and paste in the puzzle sequence:



3) Scroll down and press the button Evaluate



In this puzzle there were no 6 base sequence that had a twin or triplet sequence.


While these two long sequences that 5 and 6 potential partner sequences, they are also rather weak, so they do not have my primary interest.



When you have identified a match sequence that you think may be problematic, either because there are multiple repeats of it, it has way to many potential partners or it is very strong, you can copy the match sequence and put it into the Mark subseq and complements script. Then it will mark the match sequence in red and the partner sequences in blue.


4) Next I copy out the match sequence CGGGGC as it is rather strong + it has two partners.




Mark subseq and complements


5) Install the Mark subseq and complements booster


First you need to make your own copy of the script to use it. To use the booster, Start from copy, under Script type, save as Booster and then Submit script.


For further help how to use a booster or run a script, check AndrewKae’s fine guide:


Quick Start Guide to Using Scripted Tools in Eterna


6) Go back to your E. Coli Ribosome 23S rRNA domain III puzzle and refresh the page. This will load your new booster. Then pull the booster under the lightning button when in lab puzzles.


7) Paste in your CGGGGC sequence from the Long repeat finder script.



8) Click OK


9) Click OK to the next box and the script will mark the sequence in RED and the partner sequences in BLUE-GREEN.



10) Now what I look for is how close the partner sequences (BLUE-GREEN) are to the match sequence (RED). In the above case they are rather far apart. So judged on that, they are not as likely to interfere with each other.


11) Next I switch to native view and check how close the RED sequence get to the BLUE-GREEN ones.


12) I also look switch engines to watch the the native states in there. In neither of them the RED sequence pairs up with the BLUE-GREEN partners. The RED sequence do misfold with other sequences though. If the RED and one of the BLUE-GREEN sequences paired up somehow, I will be likely to make a mutation in one them.


13) Now there is one more sequence I find really interesting despite it having only one potential partner. It is the super strong GGGGGG.



While it may not end up with its partner, there are 2 places in this puzzle partial that have 4 repeat C’s. So I think this 6 G sequence could be worth disrupting as the native folds in different engines shows some want for these repeat C’s to pair up with the repeat G’s.



Repeat C’s marked




What to look for


  1. Look for long sequences that exist in multiple copies

  2. Look for strong sequences with lots of G’s and C’s. These are more likely to wish to go elsewhere - in other words potentially mess a bit with the overall stability of the ribosome.  

  3. If a long and strong sequence has a lot of potential partners, it is those closest by that matters the most.


When a sequence has multiple partners, it matters where they are. If the potential partners are real far apart in sequence, it is rather unlikely that they are going to interact. (Unless they also happens to be close in geometric space). I mean the RNA folds up as it gets made and as such many sequences are shielded from each other, just by this first folding up. So the partners I will count the worst are the absolutely closest.


4. The longer and stronger the sequence, the more likely it tip towards instabilities, especially if there are potential matches nearby.

5. To get an idea if your match sequence wants to go elsewhere, you can try check the native state of the puzzle and switch between engines to see what they predict.

6. I see each of the partial 16s and 23s as individual parts in the ribosome.


When it comes if a sequence copy will cause trouble, I think position in the full sequence matter.


I think a sequences position in the full puzzle matters too. A strong sequence in the middle will have more chances to go elsewhere, that if it was at either ends of the full sequence.


The RNA folds up as it gets made and as soon as a branch (like the puzzle partials for 16s and 23s) gets folded, any sequence inside has a good shot at staying put. So distance to the sequence partners matter a lot. I'm mainly interested in those that are close. Because those are the ones that have best chance to misbehave. So first I will look for copy repeats inside the same branch. But next I will aim for copy repeats when viewing two neighbour branches - like two 16s or 23s puzzle partials.


If we take the 3 partials of the 16s puzzle, I basically think that repeat sequence is more detrimental if found in Partial 2 than in either Partial 1 or 3. Because copy repeats in Partial 1 have no partner sequence before its base 1 to mispair with. Similarly copy repeats in Partial 3 won’t have any partner sequences to pair with after the last base. Whereas partial 2 have the option to mispair with partner sequences in both Partial 1 and 3.






(Edited)
Photo of Brourd

Brourd

  • 430 Posts
  • 78 Reply Likes
Something to consider for this hypothesis is that we should consult the crystal structure for where certain motifs like the A-minor motif interact with the base pairs of helices. Modifying these may disrupt some tertiary contacts and result in a fairly unstable ribosome.
Photo of JR

JR

  • 235 Posts
  • 17 Reply Likes
Where can we find that information?
Photo of Brourd

Brourd

  • 430 Posts
  • 78 Reply Likes
Probably in a pdb of the ribosome like https://www.rcsb.org/structure/4V4A
Photo of Eli Fisker

Eli Fisker

  • 2214 Posts
  • 477 Reply Likes

I think getting the A-minor motif base positions for our e.coli is an excellent idea.


"The A-minor motif is by far the most abundant tertiary structure interaction in the large ribosomal subunit; 186 adenines in 23S and 5S rRNA participate, 68 of which are conserved. It may prove to be the universally most important long-range interaction in large RNA structures".

RNA tertiary interactions in the large ribosomal subunit: The A-minor motif


I have been looking for databases to potentially spill the positions of the A-minor motifs in our e.coli. This is the closest I got. I found a paper that referred to a RNA motif database:


InterRNA: a database of base interactions in RNA structures



It contains A-minor motifs for Haloarcula marismortui, Thermus thermophilus, but not e. coli.



Here is the full search result:

http://27.126.156.175/interrna/AMlink.php



The G-ribo motif


While I was reading up on the A-minor motif, I stumbled on another group of RNA motifs - the G-ribo motif. There are 8 of them in e. Coli. Even better I have a paper spelling our their position. (Figure 3)


G-ribo: A new structural motif in ribosomal RNA


“Here we present a new RNA structural motif named G-ribo, which has been found in eight different places within the ribosome. This motif represents a specific side-by-side arrangement of two double helices connected by an unpaired region. The juxtaposition of the helices is stabilized through a complex system of specific interactions, which spread over several layers of stacked nucleotides. The location of the identified cases of this motif within the ribosome suggests that at least some of them play an important role in the formation of the ribosome tertiary structure and/or in its function.”


These G ribo motifs are probably at different base numbers in our puzzles, but there should be enough sequence that we should be able to pinpoint them there. Here is one of them. 


S521 : Base 506-529 in E Coli Ribosome 16s rRNA part 1




Photo of Eli Fisker

Eli Fisker

  • 2214 Posts
  • 477 Reply Likes

From the look of it, Andy and co have locked up most of the G ribo motif bases in the more locked version of the puzzle.



Photo of jandersonlee

jandersonlee

  • 542 Posts
  • 119 Reply Likes
It looks like the soft constraints cover this too except for the C506,G529 pair: 
Photo of Eli Fisker

Eli Fisker

  • 2214 Posts
  • 477 Reply Likes

S861 : Base 266-272 amd 300-316 in E Coli Ribosome 16s rRNA part 2


More of the G ribo bases but not all are locked in larger set of locked bases puzzle.




Photo of Eli Fisker

Eli Fisker

  • 2214 Posts
  • 477 Reply Likes

S1047 : Base 69-78, 123-128 and 289-296 in E Coli Ribosome 16s rRNA part 3


Except base 291, that is an A according to the paper instead of the U in our puzzle.



Again most of the G ribo motif is locked up in the puzzle with the larger set of locked bases, but not everything.



Photo of jandersonlee

jandersonlee

  • 542 Posts
  • 119 Reply Likes

S861 : Base 266-272 amd 300-316 in E Coli Ribosome 16s rRNA part 2
is partially locked by the soft constraints, but some bases are allowed to vary.

Photo of jandersonlee

jandersonlee

  • 542 Posts
  • 119 Reply Likes

S1047 : Base 69-78, 123-128 and 289-296 in E Coli Ribosome 16s rRNA part 3
is mostly locked by the soft constraints (all the loop bases and closing pairs, but not all of the next pairs in the stems).
Photo of Eli Fisker

Eli Fisker

  • 2214 Posts
  • 477 Reply Likes

L1024 : Base 456-463 and 577-584 in E Coli Ribosome 23S rRNA domain ll


L1309 : Base 33-40, 334-342, and base 349-355 in E Coli Ribosome 23S rRNA domain lll


Base 337 is a G according to the paper where it is an A in our puzzle.




L1642 : Base 27-34 and 354-373  in E Coli Ribosome 23S rRNA domain lll





Photo of Eli Fisker

Eli Fisker

  • 2214 Posts
  • 477 Reply Likes

L2323 : Base 252-258, 274-282 and 289-296 in E Coli Ribosome 23S rRNA domain V


L2383 : Base 242-248, 300-304 and 341-342 in E Coli Ribosome 23S rRNA domain V



These were the last ones of the 8 G ribo motifs. It looks like most of the bases are either locked or accounted for with the color booster for the 16s puzzle partials. However most of the G ribo bases in the 23S partials, are unlocked. 
Photo of Eli Fisker

Eli Fisker

  • 2214 Posts
  • 477 Reply Likes

In the quest for the positions of the A-minor motifs, I asked Rhiju if he happened to know a database containing those for e.coli.


He gave us something much better. A list with the base positions of the A-motifs, a 2D overview with the A-motifs drawn in with yellow + an extensive list of other motifs also.


I have added an image to illustrate his text, but the ribosome images with the motifs highlighted, were so big that I couldn’t upload them without losing information. So to get the full overview, follow his links.


Here is his reply:


I have a way to look through a 3D rna structure and pull out its motifs using a Rosetta application


I use that to set up ribodraw drawings


the files look like "*motifs.txt" in the ribodraw repository


here is an example:



https://github.com/ribokit/RiboDraw/blob/master/drawings/ribosome/23S_5S/rna_motif/4ybb_23S_5S_RPL.p...


actually I identify lots of different motifs, but you can find A-minors in there with the A_MINOR tag


that being said, I am a bit unhappy at the way these are being tagged -- A-minors usually involve AA dinucleotides docking into helices, and my script currently only highlights one of the A's, the one that makes a clearly identifiable base pair with the helix


I also played a bit last week with rendering the ribosome with the various motifs highlighted


A-minors are gold


here are links:


https://github.com/ribokit/RiboDraw/blob/master/drawings/ribosome/23S_5S/drawing_with_motifs.png



https://github.com/ribokit/RiboDraw/blob/master/drawings/ribosome/23S_5S/drawing_with_motifs_and_ter...



https://github.com/ribokit/RiboDraw/blob/master/drawings/ribosome/16S/drawing_with_motifs.png



So basically here is what I think we can do with what Rhiju has shipped our way. Make an script booster that show the positions of the A-minor motifs. I think this can be specifically helpful when we get the full RNA puzzles. That way we can as a final step weed out mutations in our partials that may potentially mess up the 3D structure of the ribosome.


It may also be useful having a script booster that can highlight the positions of some of the other motifs. This may have to be done as a motif filter view with multiple options to choose which motif to watch, as several of the motifs seems to overlap at the same spots. Again I think this may be useful in the final submission fase when we are to submit 3 designs for each full puzzles.


Here is a list of the motifs that Rhiju has highlighted in e.coli:


U_TURN, UA_HANDLE, T_LOOP, INTERCALATED_T_LOOP, GNRA_TETRALOOP, A_MINOR, PLATFORM C, LOOP_E_SUBMOTIF, BULGED_G, TANDEM_GA_SHEARED and  GA_MINOR


All of these motifs may not be equally conserved. I think we need to look more into what these motifs are, as to find out which of them we better leave alone.


(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2214 Posts
  • 477 Reply Likes
By the way, you may have noticed some small white or black triangles, circles and squares between some of the bases in the RiboDrawings. 

I asked Rhiju what they meant. He said: those are the symbols devised by Leontis & Westhof to annotate noncanonical base pairs and he shared a paper including a symbol translation:

Geometric nomenclature and classification of RNA base pairs

Photo of jandersonlee

jandersonlee

  • 542 Posts
  • 119 Reply Likes
While I think its important to know where these motifs are, I'm not so sure on the utility of "yet another booster", especially one that marks. At the moment we only have the ability to have one set of marks at a time from a booster so really only one marking booster at once. Rather than have a user flip between marking boosters, it might be better to add these as additional/overlapping "soft constraints". Trying to avoid "booster overload".
Photo of jandersonlee

jandersonlee

  • 542 Posts
  • 119 Reply Likes
So the line format seems to be MOTIFNAME ( SP 'C:QA:' OFFSET [ '-' OFFSET ] )* where I'm assuming the offsets are base numbers or ranges for bases involved in a motif and/or tertiary interaction. Any word on which of these motifs we should consider to "lock" the bases? In the case of A_MINOR (which you say should be AA), is there another base before/after one of the referenced bases we should also lock?

Ah I see some C:RA: as well. What's the difference?

LOOP_E_SUBMOTIF C:RA:71-73 C:RA:103-105

And on 16S is is B:V:OFFSET

A_MINOR B:V:1170 B:V:1089 B:V:1096

(Edited)