Ribosome Pilot Challenge

  • 3
  • Idea
  • Updated 2 weeks ago
This is a conversation to discuss all aspects of the Ribosome Pilot Challenge, just announced at https://eternagame.org/web/news/9221020/.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 963 Posts
  • 303 Reply Likes
  • excited

Posted 4 months ago

  • 3
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Turn DNA sequence into RNA or invert a sequence


A while back Cynwulf shared the fine TextMechanic tool with me. I was playing with making switch puzzles from natural sequences and was in need of turning some DNA sequences into RNA.

The sequences in science databases are also regularly inverted compared to what we are used to see in EteRNA. 

Point is that it is helpful having tools that can invert sequence or turn DNA into RNA. However lately TextMechanics have gone from being a free tool to become a 4 free runs an hour tool. Which is fine but didn't cut it for what I want. 

So I have made two small scripts that can do what I most want. Hope they may be of help to some of you too. 

Reverse Sequence Tool
Search and replace character Tool







Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Turn DNA sequence into RNA or invert a sequence


A while back Cynwulf shared the fine TextMechanic tool with me. I was playing with making switch puzzles from natural sequences and was in need of turning some DNA sequences into RNA.

The sequences in science databases are also regularly inverted compared to what we are used to see in EteRNA. 

Point is that it is helpful having tools that can invert sequence or turn DNA into RNA. However lately TextMechanics have gone from being a free tool to become a 4 free runs an hour tool. Which is fine but didn't cut it for what I want. 

So I have made two small scripts that can do what I most want. Hope they may be of help to some of you too. 

Reverse Sequence Tool
Search and replace character Tool







Photo of worseize

worseize

  • 29 Posts
  • 12 Reply Likes
Fatal error
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
New forum post on how we can simplify the ribosome task and use a booster to tell us about which bases (unlocked bases) we should be careful with mutating. 

Omei has kindly shared the sum up he did of what bases are usual for all the e. coli relatives for the 5s puzzle. 

Image by Omei


For details on the secret code ;) see: 

Using multiple criteria for evaluating mutations


Photo of spvincent

spvincent

  • 48 Posts
  • 8 Reply Likes
The 16s rRNA contains highly conserved regions (across species) and also variable regions. Perhaps it doesn't matter given the overall goals of the project but would it make sense to lock the conserved regions? Right now the locking seems to be based on tertiary bonding.
Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
We're working on a booster that indicates the species variations as soft constraints using colored markers to indicate known variations, black for mods that match the inter-species variants, and white for mods that don't. So far it has data for 5S and 23S. Hopefully 16S will be added soon. https://eternagame.org/web/script/9311310/
Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
Added an initial soft constraint for 16S part 2 just now. Still working on parts 1 and 3 and a better part 2.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Strategy for stabilizing the ribosome


The ribosome has a dual nature. In localized areas it is a switch, but it is a static RNA at other places.  


For a while I have focused mainly on what I consider switching areas. Now I have decided to treat the ribosome as if it was just another static RNA puzzle. It is of cause a bit bigger. ;) No need to worry though if you haven’t worked with static RNA labs earlier.


Here are what static RNA labs like.


  • Static RNA loves having structure variation.


Every element like stem, endloop, multiloop, bulge if anywhere near each other tends to be different in size. - Luckily the ribosome has already taken care of this for us beforehand. The structure is fixed. See - nothing to worry about - just ignore! :)


  • Static RNA loves having sequence variation.


The more varied sequence, the better. Luckily the ribosome has also taken mostly care of this for us already.  


However the larger the puzzle, the bigger the chance that some sequence are in repeat or that a strong sequence have multiple potential partners. The more repeat sequences, the less stability.


This is where we can give e.coli a helping hand. While the e.coli is working already from the hand of nature, we can aid it becoming more stable, by targeting repetitive sequence that potentially make the ribosome a bit unstable in some areas.



Join the #sequencevaration experiment


I will stick the hashtag #sequencevariation on my designs. You are welcome to join. The more, the merrier. Let’s give our e.coli ribosome bug a bit more backbone. A stabler one at least. :)



Script help from jandersonlee


To make the task much easier, jandersonlee has brewed up two great scripts. (Plus many more)


Thx to Jeff for the fine scripts.



Finding the repeat copies and partners

Long repeat finder 1.13


This scripts can identify sequences of 6 bases and above that are in more than one copy - match sequences. The script also identifies the potential partner sequences for each of these match sequences.


  1. I open the E. Coli Ribosome 23S rRNA domain III puzzle, copy the original sequence.


GAUAAAGCGGGUGAAAAGCCCGCUCGCCGGAAGACCAAGGGUUCCUGUCCAACGUUAAUCGGGGCAGGGUGAGUCGACCCCUAAGGCGAGGCCGAAAGGCGUAGUCGAUGGGAAACAGGUUAAUAUUCCUGUACUUGGUGUUACUGCGAAGGGGGGACGGAGAAGGCUAUGUUGGCCGGGCGACGGUUGUCCCGGUUUAAGCGUGUAGGCUGGUUUUCCAGGCAAAUCCGGAAAAUCAAGGCUGAGGCGUGAUGACGAGGCACUACGGUGCUGAAGCAACAAAUGCCCUGCUUCCAGGAAAAGCCUCUAAGCAUCAGGUAACAUCAAAUCGUACCCCAAACCGACACAGGUGGUCAGGUAGAGAAUACCAAGGCGCU


2) Then I open the Long repeat finder script and paste in the puzzle sequence:



3) Scroll down and press the button Evaluate



In this puzzle there were no 6 base sequence that had a twin or triplet sequence.


While these two long sequences that 5 and 6 potential partner sequences, they are also rather weak, so they do not have my primary interest.



When you have identified a match sequence that you think may be problematic, either because there are multiple repeats of it, it has way to many potential partners or it is very strong, you can copy the match sequence and put it into the Mark subseq and complements script. Then it will mark the match sequence in red and the partner sequences in blue.


4) Next I copy out the match sequence CGGGGC as it is rather strong + it has two partners.




Mark subseq and complements


5) Install the Mark subseq and complements booster


First you need to make your own copy of the script to use it. To use the booster, Start from copy, under Script type, save as Booster and then Submit script.


For further help how to use a booster or run a script, check AndrewKae’s fine guide:


Quick Start Guide to Using Scripted Tools in Eterna


6) Go back to your E. Coli Ribosome 23S rRNA domain III puzzle and refresh the page. This will load your new booster. Then pull the booster under the lightning button when in lab puzzles.


7) Paste in your CGGGGC sequence from the Long repeat finder script.



8) Click OK


9) Click OK to the next box and the script will mark the sequence in RED and the partner sequences in BLUE-GREEN.



10) Now what I look for is how close the partner sequences (BLUE-GREEN) are to the match sequence (RED). In the above case they are rather far apart. So judged on that, they are not as likely to interfere with each other.


11) Next I switch to native view and check how close the RED sequence get to the BLUE-GREEN ones.


12) I also look switch engines to watch the the native states in there. In neither of them the RED sequence pairs up with the BLUE-GREEN partners. The RED sequence do misfold with other sequences though. If the RED and one of the BLUE-GREEN sequences paired up somehow, I will be likely to make a mutation in one them.


13) Now there is one more sequence I find really interesting despite it having only one potential partner. It is the super strong GGGGGG.



While it may not end up with its partner, there are 2 places in this puzzle partial that have 4 repeat C’s. So I think this 6 G sequence could be worth disrupting as the native folds in different engines shows some want for these repeat C’s to pair up with the repeat G’s.



Repeat C’s marked




What to look for


  1. Look for long sequences that exist in multiple copies

  2. Look for strong sequences with lots of G’s and C’s. These are more likely to wish to go elsewhere - in other words potentially mess a bit with the overall stability of the ribosome.  

  3. If a long and strong sequence has a lot of potential partners, it is those closest by that matters the most.


When a sequence has multiple partners, it matters where they are. If the potential partners are real far apart in sequence, it is rather unlikely that they are going to interact. (Unless they also happens to be close in geometric space). I mean the RNA folds up as it gets made and as such many sequences are shielded from each other, just by this first folding up. So the partners I will count the worst are the absolutely closest.


4. The longer and stronger the sequence, the more likely it tip towards instabilities, especially if there are potential matches nearby.

5. To get an idea if your match sequence wants to go elsewhere, you can try check the native state of the puzzle and switch between engines to see what they predict.

6. I see each of the partial 16s and 23s as individual parts in the ribosome.


When it comes if a sequence copy will cause trouble, I think position in the full sequence matter.


I think a sequences position in the full puzzle matters too. A strong sequence in the middle will have more chances to go elsewhere, that if it was at either ends of the full sequence.


The RNA folds up as it gets made and as soon as a branch (like the puzzle partials for 16s and 23s) gets folded, any sequence inside has a good shot at staying put. So distance to the sequence partners matter a lot. I'm mainly interested in those that are close. Because those are the ones that have best chance to misbehave. So first I will look for copy repeats inside the same branch. But next I will aim for copy repeats when viewing two neighbour branches - like two 16s or 23s puzzle partials.


If we take the 3 partials of the 16s puzzle, I basically think that repeat sequence is more detrimental if found in Partial 2 than in either Partial 1 or 3. Because copy repeats in Partial 1 have no partner sequence before its base 1 to mispair with. Similarly copy repeats in Partial 3 won’t have any partner sequences to pair with after the last base. Whereas partial 2 have the option to mispair with partner sequences in both Partial 1 and 3.






(Edited)
Photo of Brourd

Brourd

  • 435 Posts
  • 79 Reply Likes
Something to consider for this hypothesis is that we should consult the crystal structure for where certain motifs like the A-minor motif interact with the base pairs of helices. Modifying these may disrupt some tertiary contacts and result in a fairly unstable ribosome.
Photo of JR

JR

  • 238 Posts
  • 19 Reply Likes
Where can we find that information?
Photo of Brourd

Brourd

  • 435 Posts
  • 79 Reply Likes
Probably in a pdb of the ribosome like https://www.rcsb.org/structure/4V4A
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

I think getting the A-minor motif base positions for our e.coli is an excellent idea.


"The A-minor motif is by far the most abundant tertiary structure interaction in the large ribosomal subunit; 186 adenines in 23S and 5S rRNA participate, 68 of which are conserved. It may prove to be the universally most important long-range interaction in large RNA structures".

RNA tertiary interactions in the large ribosomal subunit: The A-minor motif


I have been looking for databases to potentially spill the positions of the A-minor motifs in our e.coli. This is the closest I got. I found a paper that referred to a RNA motif database:


InterRNA: a database of base interactions in RNA structures



It contains A-minor motifs for Haloarcula marismortui, Thermus thermophilus, but not e. coli.



Here is the full search result:

http://27.126.156.175/interrna/AMlink.php



The G-ribo motif


While I was reading up on the A-minor motif, I stumbled on another group of RNA motifs - the G-ribo motif. There are 8 of them in e. Coli. Even better I have a paper spelling our their position. (Figure 3)


G-ribo: A new structural motif in ribosomal RNA


“Here we present a new RNA structural motif named G-ribo, which has been found in eight different places within the ribosome. This motif represents a specific side-by-side arrangement of two double helices connected by an unpaired region. The juxtaposition of the helices is stabilized through a complex system of specific interactions, which spread over several layers of stacked nucleotides. The location of the identified cases of this motif within the ribosome suggests that at least some of them play an important role in the formation of the ribosome tertiary structure and/or in its function.”


These G ribo motifs are probably at different base numbers in our puzzles, but there should be enough sequence that we should be able to pinpoint them there. Here is one of them. 


S521 : Base 506-529 in E Coli Ribosome 16s rRNA part 1




Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

From the look of it, Andy and co have locked up most of the G ribo motif bases in the more locked version of the puzzle.



Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
It looks like the soft constraints cover this too except for the C506,G529 pair: 
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

S861 : Base 266-272 amd 300-316 in E Coli Ribosome 16s rRNA part 2


More of the G ribo bases but not all are locked in larger set of locked bases puzzle.




Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

S1047 : Base 69-78, 123-128 and 289-296 in E Coli Ribosome 16s rRNA part 3


Except base 291, that is an A according to the paper instead of the U in our puzzle.



Again most of the G ribo motif is locked up in the puzzle with the larger set of locked bases, but not everything.



Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes

S861 : Base 266-272 amd 300-316 in E Coli Ribosome 16s rRNA part 2
is partially locked by the soft constraints, but some bases are allowed to vary.

Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes

S1047 : Base 69-78, 123-128 and 289-296 in E Coli Ribosome 16s rRNA part 3
is mostly locked by the soft constraints (all the loop bases and closing pairs, but not all of the next pairs in the stems).
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

L1024 : Base 456-463 and 577-584 in E Coli Ribosome 23S rRNA domain ll


L1309 : Base 33-40, 334-342, and base 349-355 in E Coli Ribosome 23S rRNA domain lll


Base 337 is a G according to the paper where it is an A in our puzzle.




L1642 : Base 27-34 and 354-373  in E Coli Ribosome 23S rRNA domain lll





Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

L2323 : Base 252-258, 274-282 and 289-296 in E Coli Ribosome 23S rRNA domain V


L2383 : Base 242-248, 300-304 and 341-342 in E Coli Ribosome 23S rRNA domain V



These were the last ones of the 8 G ribo motifs. It looks like most of the bases are either locked or accounted for with the color booster for the 16s puzzle partials. However most of the G ribo bases in the 23S partials, are unlocked. 
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

In the quest for the positions of the A-minor motifs, I asked Rhiju if he happened to know a database containing those for e.coli.


He gave us something much better. A list with the base positions of the A-motifs, a 2D overview with the A-motifs drawn in with yellow + an extensive list of other motifs also.


I have added an image to illustrate his text, but the ribosome images with the motifs highlighted, were so big that I couldn’t upload them without losing information. So to get the full overview, follow his links.


Here is his reply:


I have a way to look through a 3D rna structure and pull out its motifs using a Rosetta application


I use that to set up ribodraw drawings


the files look like "*motifs.txt" in the ribodraw repository


here is an example:



https://github.com/ribokit/RiboDraw/blob/master/drawings/ribosome/23S_5S/rna_motif/4ybb_23S_5S_RPL.p...


actually I identify lots of different motifs, but you can find A-minors in there with the A_MINOR tag


that being said, I am a bit unhappy at the way these are being tagged -- A-minors usually involve AA dinucleotides docking into helices, and my script currently only highlights one of the A's, the one that makes a clearly identifiable base pair with the helix


I also played a bit last week with rendering the ribosome with the various motifs highlighted


A-minors are gold


here are links:


https://github.com/ribokit/RiboDraw/blob/master/drawings/ribosome/23S_5S/drawing_with_motifs.png



https://github.com/ribokit/RiboDraw/blob/master/drawings/ribosome/23S_5S/drawing_with_motifs_and_ter...



https://github.com/ribokit/RiboDraw/blob/master/drawings/ribosome/16S/drawing_with_motifs.png



So basically here is what I think we can do with what Rhiju has shipped our way. Make an script booster that show the positions of the A-minor motifs. I think this can be specifically helpful when we get the full RNA puzzles. That way we can as a final step weed out mutations in our partials that may potentially mess up the 3D structure of the ribosome.


It may also be useful having a script booster that can highlight the positions of some of the other motifs. This may have to be done as a motif filter view with multiple options to choose which motif to watch, as several of the motifs seems to overlap at the same spots. Again I think this may be useful in the final submission fase when we are to submit 3 designs for each full puzzles.


Here is a list of the motifs that Rhiju has highlighted in e.coli:


U_TURN, UA_HANDLE, T_LOOP, INTERCALATED_T_LOOP, GNRA_TETRALOOP, A_MINOR, PLATFORM C, LOOP_E_SUBMOTIF, BULGED_G, TANDEM_GA_SHEARED and  GA_MINOR


All of these motifs may not be equally conserved. I think we need to look more into what these motifs are, as to find out which of them we better leave alone.


(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
By the way, you may have noticed some small white or black triangles, circles and squares between some of the bases in the RiboDrawings. 

I asked Rhiju what they meant. He said: those are the symbols devised by Leontis & Westhof to annotate noncanonical base pairs and he shared a paper including a symbol translation:

Geometric nomenclature and classification of RNA base pairs

Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
While I think its important to know where these motifs are, I'm not so sure on the utility of "yet another booster", especially one that marks. At the moment we only have the ability to have one set of marks at a time from a booster so really only one marking booster at once. Rather than have a user flip between marking boosters, it might be better to add these as additional/overlapping "soft constraints". Trying to avoid "booster overload".
Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
So the line format seems to be MOTIFNAME ( SP 'C:QA:' OFFSET [ '-' OFFSET ] )* where I'm assuming the offsets are base numbers or ranges for bases involved in a motif and/or tertiary interaction. Any word on which of these motifs we should consider to "lock" the bases? In the case of A_MINOR (which you say should be AA), is there another base before/after one of the referenced bases we should also lock?

Ah I see some C:RA: as well. What's the difference?

LOOP_E_SUBMOTIF C:RA:71-73 C:RA:103-105

And on 16S is is B:V:OFFSET

A_MINOR B:V:1170 B:V:1089 B:V:1096

(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

RNA necks as focus points for stability


Experimental Hypothesis: It is possible to alter the overall stability of an RNA by altering its neck sequence.


Neck = tails from each end of the sequence that pair up with each other.


Both the 5S rRNA and the 23S rRNA have a neck. I expect both static and switch RNA to have potentially long range affected by whatever bases they have in their neck. Either for better or worse.


If you wish you can join the experiment. The 5S puzzle neck holds most options for mutation. Here is the hashtag I use: #Neck.


However if you make changes to just the neck and did not hashtagged your designs, no worries. Omei and Gerry had a wonderful idea that designs belonging to the same experimental group potentially could be dug out and pooled together by a script later. In this case by specifying the base area of the necks in 5S and 23S.



Necks in the E. coli ribosome


This is rather beautiful. The ends of 5s and 23s are almost symmetrically placed. Despite these two RNA do have nothing in common when it comes to size, their ends are symmetrically placed. I also think that the neck of the 23s rRNA has importance to the stability of the large subunit. Just as the 5s rRNA likely has to the overall ribosome assembly.

From the PDB viewer https://www.rcsb.org/3d-view/5IT8/1


Ever since when I saw the symmetry like ends of the 23s and the 5s rRNA ribosome ends, I have wanted to dig out the ends of the 16s rRNA too.  


However that was hard to do in the protein viewers in PDB and PDBe. But with Chimera it is easy. Here is how to.


I downloaded the E. Coli ribosome in Chimera as described in this post.


(If chimera shows a light blue background when you open it and a light blue lightning sign in the bottom right corner, Chimera is in quick load mode. This excellent if you already have a saved molecule that you wish to load and watch. But for loading molecule all from scratch, click the light blue lightning sign and you will get a lightning marker with dark blue background and black screen as below.)



Call the Command line (Tools - General controls - Command line) Ask for first and last base in the 16s rRNA. To get it, dump this in the command line:


select :1.AA,1534.AA


Here is how I figured which RNA chain is the 16s.


Our E. coli ribosome has 3 RNA chains.


Here is how our lab describes them:


The rest is proteins that are rather short in length. We know that the 5s rRNA is 120 bases long. The 16s is rather long and 23s is really long. So I just looked through the chains and located the second longest. I also tried mark the whole thing as to see that it was actually RNA that I got highlighted. Then I could see the length of the chain and just took the first and last base for the search. Since I located the right chain, I know that its surname is AA.  





Notice how far away the ends of the 16s rRNA are from each other - still rather symmetric placed - but absolutely no neck. (Two small green highlights)


Modifying these ends may also affect how easily the ribosome gets degraded. However my primary interest is the neck of 5s rRNA and to some degree also the neck of 23s rRNA.


Green highlights of the 5s and 23s rRNA beginnings and ends.


select :1.DA,2903.DA, 1.DB, 120.DB


Perspective


Should some of these 5s and 23s mutations be beneficial for ribosome stability, in the future I would also love to see ribosomes made of different 5s rRNA’s and 23s rRNA’s that are both changed in the neck regions. Like a pooling of the changes that were beneficial in one part of the rRNA with the changes that were beneficial in another part of the rRNA.   



Background posts


Earlier lab examples on that base change in the neck or end tail region alone had effect on both static and switch lab puzzles.


Now I’m aware that in the case of our switch puzzles the effect on the tail bases may be due to the DNA tether the RNA is held between.



Different types of necks and their effects on the main design


Salish’s end bit discovery put in perspective


Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
https://www.ncbi.nlm.nih.gov/pubmed/11601857

J Mol Biol. 2001 Oct 12;313(1):215-28.
Anatomy of Escherichia coli ribosome binding sites.Shultzaberger RK1, Bucheimer RERudd KESchneider TD.
Author information

Abstract

During translational initiation in prokaryotes, the 3' end of the 16S rRNA binds to a region just upstream of the initiation codon. The relationship between this Shine-Dalgarno (SD) region and the binding of ribosomes to translation start-points has been well studied, but a unified mathematical connection between the SD, the initiation codon and the spacing between them has been lacking. Using information theory, we constructed a model that treats these three components uniformly by assigning to the SD and the initiation region (IR) conservations in bits of information, and by assigning to the spacing an uncertainty, also in bits. To build the model, we first aligned the SD region by maximizing the information content there. The ease of this process confirmed the existence of the SD pattern within a set of 4122 reviewed and revised Escherichia coli gene starts. This large data set allowed us to show graphically, by sequence logos, that the spacing between the SD and the initiation region affects both the SD site conservation and its pattern. We used the aligned SD, the spacing, and the initiation region to model ribosome binding and to identify gene starts that do not conform to the ribosome binding site model. A total of 569 experimentally proven starts are more conserved (have higher information content) than the full set of revised starts, which probably reflects an experimental bias against the detection of gene products that have inefficient ribosome binding sites. Models were refined cyclically by removing non-conforming weak sites. After this procedure, models derived from either the original or the revised gene start annotation were similar. Therefore, this information theory-based technique provides a method for easily constructing biologically sensible ribosome binding site models. Such models should be useful for refining gene-start predictions of any sequenced bacterial genome.

Copyright 2001 Academic Press.

PMID: 11601857 DOI: 10.1006/jmbi.2001.5040


https://en.wikipedia.org/wiki/Ribosome-binding_site

Ribosome-binding site
From Wikipedia, the free encyclopedia

ribosome binding site, or ribosomal binding site (RBS), is a sequence of nucleotides upstream of the start codon of an mRNA transcript that is responsible for the recruitment of a ribosome during the initiation of protein translation. Mostly, RBS refers to bacterial sequences, although internal ribosome entry sites (IRES) have been described in mRNAs of eukaryotic cells or viruses that infect eukaryotes. Ribosome recruitment in eukaryotes is generally mediated by the 5' cap present on eukaryotic mRNAs.

Prokaryotes

The RBS in prokaryotes is a region upstream of the start codon. This region of the mRNA has the consensus 5'-AGGAGG-3', also called the Shine-Dalgarno (SD) sequence.[1] The complementary sequence (CCUCCU), called the anti-Shine-Dalgarno (ASD) is contained in the 3’ end of the 16S region of the smaller (30S) ribosomal subunit. Upon encountering the Shine-Dalgarno sequence, the ASD of the ribosome base pairs with it, after which translation is initiated.[2][3]

Variations of the 5'-AGGAGG-3' sequence have been found in Archaea as highly conserved 5′-GGTG-3′ regions, 5 basepairs upstream of the start site. Additionally, some bacterial initiation regions, such as rpsA in E.coli completely lack identifiable SD sequences.[4]


Don't mess with the (A)SDS?
(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Thx, Jeff. I was tempted to fiddle with the 16s tails also. :) 

However it looks like I got lucky in regards to 5S and 23S. 

The rRNA often gets made together in the same mRNA - in a so called operon - hence not all rRNA's contain SDS's. 

From Wikipedia: 

"Bacterial 16S ribosomal RNA, 23S ribosomal RNA, and 5S rRNA genes are typically organized as a co-transcribed operon. There is an internal transcribed spacer between 16S and 23S rRNA genes.[5] There may be one or more copies of the operon dispersed in the genome (for example, Escherichia coli has seven)."
https://en.wikipedia.org/wiki/Ribosomal_RNA


The latter also plays a role in how fast an organism replicates. 

Operon: "In genetics, an operon is a functioning unit of DNA containing a cluster of genes under the control of a single promoter.[1] The genes are transcribed together into an mRNA strand and either translated together in the cytoplasm, or undergo splicing to create monocistronic mRNAs that are translated separately, i.e. several strands of mRNA that each encode a single gene product. The result of this is that the genes contained in the operon are either expressed together or not at all. Several genes must be co-transcribed to define an operon.[2]"
https://en.wikipedia.org/wiki/Operon


(Edited)
Photo of Brourd

Brourd

  • 435 Posts
  • 79 Reply Likes
The ribosomal RNA are all transcribed together to ensure perfect stoichiometry. That way, the cell doesn't make too many of any particular rRNA, which would cost the cell energy to have these parts just floating around not doing any function. So each part of the rRNA is made together.

The Shine-Dalgarno sequence is essentially what binds mRNA (in prokaryotes) to the 16S ribosomal subunit. There are a lot of figures that show translation steps that I could show here if desired, but mRNA binds to the 16S subunit before it binds to the 23S subunit.

In theory, one would expect that transcription is the limiting factor of organism replication (you could probably claim that transcription of the ribosome is the limiting factor, but the cell will regulate in a lot of different ways how many ribosomes it makes, and transcription is pretty fast). Typically, it ends up being the rate of protein synthesis that limits cell growth. Which stems from how many ribosomes there are, which could be regulated by transcription, so there are some arguments to be made where the true limiting factor is. (this could also change depending on extracellular and growth conditions as well)
Photo of whbob

whbob

  • 187 Posts
  • 57 Reply Likes
Once the 16S grabs the start codon, does the tRNA come from or somehow connect to the 23S? 
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Hi whbob!

Recently I was watching Rhiju's RiboDraw images when I took notice that there were kissing loop interactions between some of the domains inside Rhiju's 23S drawing. 

I asked him a question and his answer back, kind of has an answer to your question. 

Eli: I can see several kissing hairpin loops in the 23S, thx to the tertiary interactions you have drawn.

What I find particular beautiful is that the kissing loops are connectors between the different domains. So domain lll connect to domain IV and domain V connects to domain 1.

I wonder if there are such kissing loops connection between the small and big subunit?


rhiju_s_23s_-_with_kissing_loops_as_domain_connectors, green and red highlight
httpsfilesslackcomfiles-priT1WSW470R-FJ0P9L284rhiju_s_23s_-_with_kissing_loops_as_domain_connectorspng




Rhiju: i don't think there are kissing loops connecting small and big

there are actually not that many RNA-RNA contacts between small and big subunits -- the ones that are identifiable are shown in this diagram:

https://github.com/ribokit/RiboDraw/blob/master/drawings/ribosome/70S/drawing.png

however there are some protein-mediated 'bridges' between small and large ribosomal subunit

they are literally called bridges and have names like 'B3' (B = bridge)

perhaps more interesting is that the small and large subunit get bridged by tRNA's when they transit through the ribosome. There is a 'kissing loop' between tRNA anticodon and the mRNA codon. At some point I want to update the ribodraw diagram to include those extra molecules -- just haven't had a chance.



Photo of whbob

whbob

  • 187 Posts
  • 57 Reply Likes
Hi Eli, thanks for the link. Yes, I see G1417 and A1418 of 16S links to C1947 and G1948 of 23S ( if I am reading the schematic correctly). There is also a link from A702 on the 16S to G1846 on the 23S. 
Interesting that there is an AUG start codon at A1413 ( very close to the 16S/23S link) nearby. 
I have read where the 16S and 23S need to wobble around to accommodate the tRNA protein building process. Maybe one of these connections is the lever and the other is the fulcrum. Looking forward to the updated drawing showing the bridges and 'kissing loop'.  
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

I have found a good short introduction to protein bridges between the ribosomal subunits. This is only the abstract summary of a paper, but it gives clear overview. There is also a graphical abstract, which is a praiseworthy thing.


It says that there are 12 intersubunit bridges.


Also there is a special protein B1b bridge that binds to proteins in both the small and the large subunit.


The Intersubunit Bridge B1b of the Bacterial Ribosome Facilitates Initiation of Protein Synthesis and Maintenance of Translational Fidelity


I have found another paper that goes more into the details:


"Bridge B1b connects S13 and large subunit proteins L5 and L31 in the classic-state T. thermophilusribosome, through electrostatic and possibly hydrogen-bond interactions (Table 2)."


Intersubunit bridges of the bacterial ribosome


Now this is Thermus thermophilus and not e. coli, but I'm assuming since they are both bacteria, they may have something similar going. I searched PDB for the ribosomal bridge protein and yup that is the case.



http://www.rcsb.org/pdb/results/results.do?tabtoshow=Current&qrid=5EC65002

So the bridge protein B1b connect up the protein S13 in the small subunit and also the proteins L5 and L31 in the large subunit.


Now I thought that L5 sounded really familiar. It is one of the proteins that is binding to our 5S rRNA.


Here is an image I posted of the protein binding sites in 5S earlier.

Image source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC344668/?page=4, green color my addition.


Since what the first bridge protein abstract says is that the ribosome do not assemble correctly or work well if the bridge protein B1b is deleted, things are makine even more sense now. I heard that the ribosome don’t work well either without the 5s rRNA. (there are a few exceptions like some mitochondrial ribosomes that are not having a 5S rRNA but are using a tRNA instread) So when the 5s rRNA isn’t around, then there is less for the protein bridge B1b to bind to between the two subunits.


Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
Hmm. It looks like *most* of the 5S rRNA is binding to *something* else. Avoiding changing _any_ of those bases would not leave much wiggle room for modifications. Just for myself, I'll probably stick with the soft-constraints as basis of what to change as those SNP mods have at least been seen in WT rRNA that still *work* for some gammaproteobacterium.
Photo of whbob

whbob

  • 187 Posts
  • 57 Reply Likes
I just made a quick read of the ncbi link to the intersubunit bridges of the bacterial ribosome above. Wow, the multi axis motion of the small subunit is extremely intricate. Makes a DVD reader seem like a simple machine.
Maybe someday, we can just concentrate on that aspect of the ribosome. Good things might come from it.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
In one of my post above I mentioned an image in relation to kissing loops, that did not show. 

rhiju_s_23s_-_with_kissing_loops_as_domain_connectors, green and red highlight my addition




I have reduced the image quality quite a lot to post the image. So here is a link to the original image - without my color addition. 

https://github.com/ribokit/RiboDraw/commit/3eeead9415c2b313d9420ac17bb6f561f96fc9fd

The image is the middle one of 3. 


Photo of Gerry Smith

Gerry Smith

  • 62 Posts
  • 27 Reply Likes

The below chart shows the Deltas and # Mutations for the recent 40 Full 23S IUPAC compliant designs.

 Omei asked me to comment/share the method I used for the “red dot” design# 9360042 because of its different Delta efficiency profile.

If looked at on a per mutation basis, IUPAC constraints dramatically lower the number of potential designs.  From nearly uncountable to a reasonable 1,422.  See table below.


A full list of these mutations is contained in tab “IUPAC Choices” in the attached file “InsertAid”.

 Additionally, any of the lists from the IUPAC Choice tab can be copy/pasted into the Insert Tab (see instructions in Insert tab) to  systemically generate lists of IUPAC compliant sequence list.  Which you can then cut and paste into Mutation Tool List box and look through designs.

 Further instructions are in the attached InsertAid file.

https://1drv.ms/x/s!AjD0uVJanThY8QhM0ZYeZeMurwhb 

Whoops - this file is too big to share this method - I will need to find another way....

 This brute force method has dramatically speeded up ability to evaluate solutions.  In future, will be looking to automate additional ways to eliminate choices at each stage.  (and especially for ways to speed up 4-5 second Vienna model run!).

 The key principles I used to generate Design 9360042:

 1.      Systemically cycle thru specific IUPAC compliant mutation changes for entire 23S design (don’t optimize segments and try to put together).  Need systematic delta coordination/mutation harmony.

2.      Use high delta threshold before accepting a mutation (see Delta Thresholds below).  Because any change leads to radically different path.  Assumption being that choosing lower Delta selections will rapidly deteriorate good future choices.

3.      Be agnostic as to adding or subtracting mutations to find Delta/mutation improvements (see below for Reverse Mutation method.)

4.      Diversify / balance / randomize types of mutation changes.

 

Secondary Principles:

1.      Use lower delta threshold in evaluating C’s back to loops.

2.      Use lower Delta threshold for pair selection than loops.

3.      Use lower Delta Threshold for mutation decreases.

 

Reverse Mutation

 Objective – find a mutation to remove that improves the prior sequence’s delta (the sequence before the last added mutation) by at least the Delta Threshold.

 

Method:

1.      Selection Mark Mutations in mutation tool for the sequence you wish to improve.

2.      Run mutations

3.      Page through mutations just watching for designs with 1 less mutation, keeping note of lowest Delta.

4.      If you can find a sequence that reduces prior sequence’s delta by Delta Threshold, move to that sequence.

 

Delta Thresholds (these were thresholds I used for this design – I am currently trying higher levels)

 Added mutation >2

Eliminate mutation >1

Pair selection >2 or 1/NT

Adding C’s back to Loop > 1.5

 

(Edited)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 963 Posts
  • 303 Reply Likes
Here's a link that you can use to download Gerry's Excel spreadsheet: https://drive.google.com/file/d/1EFyQJKk-IsrQo3lkMg3aTkLLpVnFNqJ1/view

Google will try to create a preview, and fail, but will still give you the option to download it.
Photo of DigitalEmbrace

DigitalEmbrace

  • 27 Posts
  • 19 Reply Likes
Very interesting. Gerry, could you create a delta/mutation chart for non-compliant designs?
Photo of whbob

whbob

  • 187 Posts
  • 57 Reply Likes
what version of excel do I need to read that many columns?
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 963 Posts
  • 303 Reply Likes
I don't know. I have a subscription, and hadn't thought about that as an issue. What version of Excel do you have?
Photo of whbob

whbob

  • 187 Posts
  • 57 Reply Likes
I don't. My Mac has numbers & libreOffice, neither of which does more than 1024.
Photo of whbob

whbob

  • 187 Posts
  • 57 Reply Likes
I don't want to buy a version that can't handle the columns we need these days.
Photo of Gerry Smith

Gerry Smith

  • 62 Posts
  • 27 Reply Likes
I use the version that comes with Office 365.
Photo of Gerry Smith

Gerry Smith

  • 62 Posts
  • 27 Reply Likes
DigitalEmbrace - I currently pull Delta's for each design by hand, so getting chart for non-compliant designs is too time consuming for me.  I am going to delve into jandersonlee's arcplot booster to see if I can run large numbers of mutations at once - which would address this...but keep me focused on design improvement for now.
Photo of Gerry Smith

Gerry Smith

  • 62 Posts
  • 27 Reply Likes
DigitalEmbrace - I currently pull Delta's for each design by hand, so getting chart for non-compliant designs is too time consuming for me.  I am going to delve into jandersonlee's arcplot booster to see if I can run large numbers of mutations at once - which would address this...but keep me focused on design improvement for now.
Photo of DigitalEmbrace

DigitalEmbrace

  • 27 Posts
  • 19 Reply Likes
Definitely not worth the time doing by hand. 
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Omei brought up yesterday that voting on the 23S lab puzzle was thin and that many of the designs were not following the soft constraints (having changes that have earlier been seen to work in related organisms to e. Coli.)


Astromon came up with the brilliant idea of getting the soft constraint booster turned on by default. But since we are not there yet and lab will close before and I suspect that a lot of you have not found the built in booster yet, I will show how to activate it.   


  1. Under the Lightning arrow choose the Soft Constraint Marker



This will activate a system of colored rings around the bases that will show what bases are considered safer to change. Most important highlights:


  • Legal change will get a black ring

  • Potentially illegal changes will get a white ring

  • If you don’t know the color code system of the rings yet, try shift bases. Black and white rings will tell on if your change is considered good or bad by the booster.


For a more detailed orientation on base the color system, see this forum post:


Using multiple criteria for evaluating mutations


NB: The booster is built in, and it is no longer needed to call it via a script as described in the post.



Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Omei did something to help us find the designs that fulfill the soft constraints. He made a spreadsheet where you can sort 23S designs by their number of IUPAC-violations or their mutation-count. 

Ribosome Challenge Solution Stats




Here is what Omei said: 
"I noticed the voting for the 23S puzzle is pretty light, and numerous designs with a ton of IUPAC violations are in the top votes. I'm guessing that this is largely due to the solutions being hard to review.


I've expanded a script I've used a lot in the past (https://eternagame.org/web/script/7186229/) to include the ability to ask for the number of mutations and the number of IUPAC constraints for each and created the spreadsheet Ribosome Challenge Solution Stats (https://docs.google.com/spreadsheets/d/1j2UWHhl-jhFIqd2JXHFvW16JXy7kPje6H_H984ju4iI/edit#gid=0&f...) to help narrow down the search for more promising designs. Hopefully we can refine our choices before voting closes."
(Edited)
Photo of DigitalEmbrace

DigitalEmbrace

  • 27 Posts
  • 19 Reply Likes
When did Omei and Eli and apparently others decide that any design violating soft constraints definitely would fail? I thought the soft constraints was merely one approach, I didn't realize it was the only approach.
Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
It's not that they will definitely fail, but rather that the SNPs (single-nucleotide polymorphisms) that make up the soft-constraints are mutations that have been seen to *work* in nature, hence might be more likely to work. i've voted for some designs that include some soft constraint violations, but the majority of my votes go to those that don't have any/many. Also, a design that focuses on lowering the dFE without looking at the resulting miss-folds, miss-pairings and secondary shape alterations won't get my vote.
(Edited)
Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
Also, there are many areas of the rRNA that are involved in tertiary interactions with itself, with other rRNA, with proteins, or with mRNA and the protein being produced. There are too many regions for me to track them all, so for me, I use the soft-constraints as a *proxy* for saying modifying this area *may* be safer as it will be less likely to affect these other tertiary interactions, because there at least least one wild-type ribosome that worked with this change.
Photo of DigitalEmbrace

DigitalEmbrace

  • 27 Posts
  • 19 Reply Likes
A design that stabilizes the target structure and eliminates most global misfolds may very well not work, but it seems worthwhile to select at least one such design and if the synthesized 23s is a hot mess, then we can cross that approach off the list.

Dl2007 has consistently produced some of the most successful designs. It's like benching one of your star players during the playoffs. 
Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
I've transferred a vote to dl2007's "23s 35 changes"
Photo of Brourd

Brourd

  • 435 Posts
  • 79 Reply Likes
This is more of an issue with not properly identifying the regions of the RNA that should not be modified or it may kill activity. As jandersonlee stated, the soft constraints are based on consensus sequence conservation across the proteobacteria phylum (of which E. coli is a member). There are multiple regions in the RNA that are well conserved, and modification of these will generally result in deactivation of the ribosome. The analogy of benching a star player at the playoffs is not a particularly relevant one, considering an individual who is a star at baseball may not necessarily be good at golfing, as an example.

However, with that said, conservation of some of these sequences doesn't necessarily correlate to increasing activity or stability. Random mutations to one of the ribosome genes that are not corrected will result in an organism with decreased fitness, and so most of the mutations that are conserved across the proteobacteria phylum are more likely related to regions that have been modified for enhanced ligand recognition to either enhance or decrease activity or regulation, or they're random mutations that are not as likely to affect global activity. As a result, mutating these regions may not actually give the desired increase in stability or activity (or whatever metric they are aiming to measure). Essentially, over-fitting for known mutations may not be the most appropriate direction for this project.

Mayhaps it would be best to take inspiration from a thermophilic prokaryote? Or mayhaps it would be good to dedicate time to more complex hypotheses about helix redesign? Mayhaps it would be useful to focus on redesigning all of the pseudoknotted helices? Whatever the case, a single round will not rule in or rule out any hypotheses, although it may bias players to a single sequence if they're given hope it works better that the WT.
Photo of Brourd

Brourd

  • 435 Posts
  • 79 Reply Likes
As an addition to the first point, it is also not known what mutations kill activity, unless it has been used to test if activity dies. These are generally focused on the core of the active site or on proposed regions that interact with some protein.