Ribosomal Binding Sites in RNA violated in our designs
The e.coli ribosome consists of more than an RNA core made from 3 RNA chains - 5S, 16S and 23S. It is clamped together with a huge number of smaller proteins. 51 ribosomal proteins to be precise.
28 ribosomal proteins binds to the large subunit (LSU), 23S
20 ribosomal proteins binds to the smaller subunit (SSU), 16S
6 ribosomal proteins binds to 5S.
If you notice that the protein numbers when added, exceeds 51, it is because some of the ribosomal proteins that touches 23S also are binding to 5S.
How do ribosomal proteins binding sites relate to our lab designs?
I have collected all the RNA motif and ribosomal binding site violations in one document.
There is only one 23S design (2.17) that does not violate neither ribosomal binding sites and RNA motifs.
Also I find Astromon's 2.21 23S design really curious. It is one of the designs with a lot of mutations. 43 mutations. Most of these huge mutation number designs do really bad. Yet it is low in both RNA motif and ribosomal site violations in comparison to the other designs with many mutations. (13 violations) And this design do really well.
Similarly the 16S design 2.11 with 21 mutations, but only 7 total violations does well too.
This suggests that one can get away with a lot of mutations, if one is a little careful not violating too many motifs or ribosomal binding sites. Even though the ribosomal binding site violations are already in accordance with IUPAC.
On the 5S sequences, the design that has fewest RNA motif and protein binding site violations 2.05, Gerry Smiths does close to best.
Our ribosome lab designs versus RNA motifs and ribosomal protein binding sites
2.01: 0 motif violations, 2 binding violations - total 2
2.02: 1 motif violation (Platform) 2 binding violations - total 3
2.03: 0 motif violations, 1 binding violation - total 1
2.04: 0 motif violations, 2 binding violations - total 2
2.05: 1 motif violation (Platform) 0 binding violation - total 1
2.06: 0 motif violation, 1 binding violation -total 1
2.07: 0 motif violations, 2 binding violations total 2
2.08: 0 motif violations, 2 binding violations total 2
2.09 - 0 motif violations, 2 binding violations - does fair (2M)
2.10 - 0 motif violations, 3 binding violations - does fair (13M)
2.11 - 1 motif violations, 6 binding violations - does fair (21M) (Z-turn)
2.12 - 7 motif violations, 20 binding violations - does bad (55M) (2 A-minor) (5 G ribo)
2.13 - 2 motif violations, 8 binding violations - does OK (17M) (2 G-ribo)
2.14 - 2 motif violations, 4 binding violations - does bad (11M) (Platform, GA-minor)
2.15 - 2 motif violations, 2 binding violations - does fair (4M) (Same Z-turn motif)
2.16 - 6 motif violations, 21 binding violations - does bad (55M) (A-minor, Z-turn, Loop E submotif, 2 G ribo)
2.17 - 0 motif violations, 0 binding violations - does fair (1M)
2.18 - 0 motif violations, 2 binding violations - does bad (32M)
2.19 - 2 motif violations, 2 binding violations - does fair (15M) (Platform/Bulged G, A minor)
2.20 - 17 motif violations, 13 binding violations - does bad (72M) (Platform/GA minor, Loop E, GA minor, Platform, Bulged G, U-Turn, GA minor, U-Turn, Bulged-G/Platform, U-Turn, Platform) *Two bases were changed in U-Turn, GA Minor, U-Turn*,A minor
2.21 - 5 motif violations, 8 binding violations - does fair (43M) (UA handle, GA minor, Tandem GA)
2.22 - 0 motif violations, 2 binding violations - does OK (4M)
2.23 - 1 motif violations, 2 binding violations - does fair (7M) (A minor non-WC pair)
2.24 - 5 motif violations, 11 binding violations - does bad (35M) (A minor, Platform, U-Turn, Platform/Bulged G,T-Loop,Active site)
How many 5S solves without violations are there?
I wonder how many ways it is possible to solve the 5S puzzle in game so it is stable without violating any of the RNA motifs, ribosomal protein binding sites and IUPAC?
I made a solve in Vienna2 and ran the mutation booster on it to gather some more solves.
Very conservative list of safe mods
Here are my Vienna2 attempts. There should be more solves when combining some of these bases for legal solves. Perhaps you can come up with some other ways. And how about the other engines? Is it possible solving there?
RNA motif and ribosomal binding site overview
DigitalEmbrace started out the 23S sheet with the locked bases and let me copy, Omei proposed how to list the RNA motifs, Gerry helped with finding the paired bases for 23S. Omei helped with some of the functions in the sheet, so have jandersonlee as I have imported a bunch of columns from the helix map sheet he started. Rhiju did the biggest job with identifying the RNA motifs and ribosomal binding sites for escherichia coli in the first place.
Ribosome puzzles with fewer bases to mutate
By knowing where all bases that have a specific function, we have already reduced the task with mutating the ribosome considerable. Here is a list of the bases that are left after taking out all the bases with known functions:
Number of bases in the original puzzles in parentesis.
Very conservative list of safe mods
5S 28 bases (120)
16S 330 (1534)
23S 713 (2904)
Super conservative list of safe mods
5S 23 bases
We already know that we can probably get away with violating some constraints. That not all RNA motifs are equally grumpy about getting changed. There was Jieux's design that modified a hairpin and violated IUPAC and still did well. Among the designs doing well are also some that violated some ribosomal protein binding sites. However touching too many of the bases with known associated function at once seem the sure way to trouble.
So this list is not say that we can't ever mutate these bases, rather it is a helping hand so we are aware when we do, so we can do it with a purpose.
Overviews of Rhiju's RNA motifs and ribosomal protein binding sites
Ribosomal protein binding sites
NB, everytime it says RA in any of these lists, it refers to a binding site in the 5S
Is it worse mutating in stems or single bases in ribosomal protein binding sites?
I have been asking around about the seriousness of mutating in ribosomal binding site bases versus mutating in RNA motifs. I sent my discussion with Omei to Andy and Antje.
Eli: I assume it is more serious messing with the ribosomal protein binding sites than most motifs.
Omei: FWIW, I wonder about the high importance of protein site mutations, at least in helices. RNA-protein bindings are rarely hydrogen bonds, AFAIK.
Eli: I wonder if those mallable sites (according to IUPAC) in the Ribosomal protein binding sites are in stems then.
Omei: ... and in a helix, the specific bases aren't exposed. The only effect it could have with a protein is the indirect one that comes from the fact that although the helix shape is pretty much uniform, specific base pairings can make small tweaks.
Eli: Ok, so what I hear you say is that ribosomal protein binding sites perhaps is less sensitive to modifications, than RNA motifs.
Omei: I'll hedge on that last comment. I don't really know that the proteins can never get close to the bases via one of the grooves. That's my belief. But you could ask Andy or Antje to get better advice.
I have checked the 16S RNA ribosomal binding sites. 33 of the ribosomal protein binding sites that had alternative IUPAC options for a solve, were single base. 82 of them were in a stem. So there is an an overweight of changeable stem bases in ribosomal protein binding sites compared to single bases. I also recall seeing one protein "binding" up with a RNA base by having an MG ion in between them when I was viewing e. coli in Chimera.
Andy Watkins: yeah, that’s an interesting possibility, actually
you can get some specificities from the grooves, actually
a guanine’s Hoogsteen edge (basically, the “long face” of the base) pairs quite naturally with an arginine side chain; ditto an adenosine H edge with asparagine/glutamine
Eli: So it is basically the purine bases at the ribsomal binding sites that we may have to be careful about changing under very specific conditions. Argenine, asparagine/glutamine, being nearby in space.
Andy Watkins: those are definitely the biggest risk factors
For details on Hoogsteen bases see the bottom of the post.
The ribosome folds up in stages
Antje Krüger: I‘m convinced that mutations at nucleotides in the vicinity of ribosomal proteins can affect their binding and thereby the folding, re-folding, assembly and stability of the ribosome. The wild-type RNA itself would not fold into its structure without them. The assembly itself is an ordered process involving RNAfolding and protein binding
Eli: I kind of imagine the ribosome as an RNA ball clamped together with Ribosomal proteins, so this makes good sense.So the ribosomal proteins themselves assist with the RNA folding?
Antje Krüger: Yes, they do. I imagine the proteins as shape keepers. There are also so called ‘assembly factors’ involved in ribosome biogenesis. These temporarily bind the ribosome and then e.g. chemically modify some of the nucleotides or break base pairings and form new ones.
Eli: I have read about chaperones as aides for folding RNA. I just hadn't thought about ribosomal proteins as such.
Antje Krüger: This is a recent article about the different stages of the large subunit when it gets reconstituted in vitro (please note that it is not iSAT what I do) without assembly factors.
I suggest to read the abstract, intro and discussion first and then dig into the specifics
Antje Krüger: and this is a recent review.
While our ribosome designs are using ISAT - in vitro folding - so they may not fold up as the wildtype, I still think it is very interesting that the ribosome folds in steps. The paper has 5 folding states identified for the 23S. Perhaps we could mutate in specific of these states. So our mutations hits the folding area that should be timed to fold at the same go.
How many ribosomal binding sites are single versus stem?
Ribosomal protein binding sites
5S: 7 bases in stem, 5 single bases
16S: 82 bases in stem, 33 single bases
23S: 155 bases in stem, 77 single bases
The pattern of most ribosomal proteins binding sites that are mallable according to IUPAC is touching stem bases, exist for the whole ribosome.
For an image of a Hoogsteen edge see Figure A
I have started a datasheet where I check into these bases
Hypothesis: Base repeats of 5 or more are causing transcription failures in the ISAT experiment, resulting in wasted energy and reduced protein synthesis.
Although we are focussing (for good reasons) on ribosome folding, there is another possibility I think we should test for -- transcription efficiency in the ISAT experiment.
In the ISAT experiments, the ribosome is first transcribed from ribosomal DNA and then the fluorescent protein is made from messenger RNA. If there is anything that makes transcription less efficient than in vivo, the end result would show up as lowered total fluorescence, because the fluorescence is ultimately limited by the amount of chemical energy (supplied by pyruvate) that the experiment starts with.
In Eterna labs, we have seen in the past that repeated bases can interfere with the polymerization of complementary nucleic acid chains (which are highly analogous to transcription, but usually distinguished with separate terms). This was very much an issue in the early labs, where repeated bases interfered with the DNA -> DNA polymerization ("duplication") associated with amplifying the DNA. We've also seen it in the RNA->DNA polymerization ("reverse transcription") of poly(A) sequences in the SHAPE labs. It may well have occurred in the transcription process of all the DAS lab experiments, but effectively hidden, either by disallowing more than 4 consecutive Gs or Cs in lab puzzles, or by experimental protocols that compensate for the lower RNA production of sequences that include long stretches of As that we saw in the early riboswitch puzzles.
I looked at the WT 23S sequence and there are 12 segments with a base repeating 5 or more times - 7 poly(G), 1 poly(C) and 4 poly(A). Of these, all but two of the poly(A) segments contain at least one mutation that would break up the sequence without causing an IUPAC violation. I think it's worth using a synthesis slot or two to see whether simply breaking up these sequences has any effect on the experimental results.
I'll suggest the hashtag #transcription be part of the title or description of any submissions designed to test this hypothesis.
Ghost ribosomal proteins
I have realized that there are a lot more parts of the ribosome that we have not yet found out where touches the RNA part of the ribosome.
Here is a part of the journey that lead me to realize. I was interested in watching the ribosome in action - with details of its movements. So I have been asking around.
First a bit from the silly department: I found out that science papers generally show the ribosome with the large subunit on the top and the small subunit at bottom. Skull like. I wonder if it has any preferred orientation in space? I mean it has as soon as it meets a mRNA. :)
Different ribosomes, different proteins caught on the film
I asked Omei what I should search for in PDB if I wanted to see the motions of the ribosome during translation.
Omei: In PDB site, I entered 'Structural characterization of mRNA-tRNA translocation intermediates' into the search and got a bunch of results. It looks like those that start with 4V6 all come from one paper.
I started to watch them. Instead of finding movements, I found something else.
I found a ghost ribosomal protein that is not present in the 4YBB Rhiju based his overviews on.
L31. It's kind of an elastic protein binding together the LSU with the SSU.
In 4V6O this spaghetti protein is curled up, while in the 4V6P it is more stretched out.
4V60 (left) and 4v6P (right) and with 5S on top
L31, 5S with both tRNA's plus mRNA
I find it particularly interesting that L31 is touching 5S. (Since I believe 5S is a switch.)
Also there is a beautiful symmetry to this. There is a balance between the parts. One tRNA on each side of the L31 protein axis. Similar 5S is in between them the tRNA positions as well.
Bridges between the ribosome subunits
By the way, I found out that L31 was L31 by hovering over the protein in Chimera. That would give me the name of the chain. In this case B2 for both of them. (Heads up, the chain names changes a lot in chimera, so one always have to check the PDB entry.) Then I searched for B2 in one of the PDB entries:
Finding a name on a protein:
With the protein name in hand I could dig up papers. Here is the description I found:
" In addition, we show that the failure to identify L31 in many ribosome preparations is probably due to the protein's loose association with the ribosome and its ability to form various intramolecular disulfide bonds, leading to L31 forms with distinct mobilities in gels."
So L31 is an intersubunit bridge. The only pure protein of such a kind. There is a whole series of them and I haven't yet caught them in my spreadsheet.
So what makes L31 allow the stretch? Is that protein switching?
L31 doesn't look like any of the other proteins I have seen. Zero alpha helices. No beta sheets either.
Since this L31 sort of connects and holds together the LSU and SSU, what about connecting L31 to its two nearest ribosomal proteins - S14 and L5 - in one long protein? I'm aware that making longer ribosomal proteins, hurts the assembly time of the ribosome.
e.coli with "fused proteins", S14 to L31 to L5
Antje shared a paper with me: Transcription Increases the Cooperativity of Ribonucleoprotein Assembly. (Sorry, it is paywalled)
Here is the main point I got from the paper. The ribosome start fold up with the ribosomal proteins even before it has finished being made. So everything is timed, binding of specific ribosomal proteins bind before others and are dependent on other ribosomal proteins. The paper helped find out what order some of the ribosomal proteins are binding.
I was starting to consider that the ribosome may not fold up in the same timed states, that would make S15 and L5 be folded up in a good time so that they will make the L31 stick to the LSU and SSU and keep the ribosome together.
L31 looks kind of like shoulder straps keeping the ribosome's pants up. From what I understand it is crucial for protein making. The ribosome won't work or be really slow without. So my reasoning for wanting to attach its two nearest neighbour proteins was that the ribosome wouldn't fall apart. I have no idea either if it could then bind to the mRNA with the L31 tugged between S15 and L5. I was just thinking about the cool stuff Jewett lab has done with gluing the two ribosomal subunits together. I thought that perhaps the central protein holding the two subunits together, would need to be fused to a protein from each subunit, for keeping the ribosome assembled. Perhaps just one of these proteins. So what I imagined may not be possible, if it messes with the binding orders of proteins.
Antje Krüger: I do wonder all these things myself. If we understand the binding order of the proteins better, what actually drives it and how well do they attach. If we understand their function more, we may be able to make smart fusions, alter their structure and don't mess with the assembly itself. Some proteins are involved in locking the rRNA structure, others are thought to be involved in protein synthesis.
Videos of the ribosome in movement
Antje shared this fine ribosome movie with me:
This made me realize that we didn’t have the binding sites of these assembly factors.
I bugged Rhiju too and he mentioned: There is a good MRC video that was commissioned by Ramakrishnan.
This is related to the one Antje mentioned. It is in higher resolution, but it lacks the names. So each of them are great on their own.
If you are curious about Ramakrishnan, here is a fine video introduction to him. It is funny too.
Ghost proteins and more motifs to catch
To add in the Motifs and protein binding sites datasheet.
Missing SSU proteins
Missing LSU proteins
IF1, IF2, IF3, RF2
Where they typically touch the rRNA
Where it typically touch the rRNA
Where the growing amino acid chain typically touch the rRNA
Missing RNA motifs
Column for modified bases
Messing with the modified bases in the ribosome can disrupt the function of the ribosome.
Highlighting RNA motifs with a script
A while back I had an idea for highlighting eg. all A-minor motifs in a design.
I used the script: Report/Mutate/Mark/Unmark Bases (v1.1) https://eternagame.org/web/script/9537916/
DigitalEmbrace volunteered to find the bases that violated motifs in the 23S lab designs.
I simply grabbed the 4 violations she identified in Dl2007's 2.24 puzzle (57, 189, 569, 1930) dumped them in the script and I can highlight them.
With 4 bases highlighted
What I imagine is that each of the RNA motifs can be grouped in a big chunk of comma separated bases numbers. So they can be highlighted either as single group or as all of them in one go. All it would take is to dump the pre prepared group of bases in the script to see the bases one want to leave alone. Or where specific RNA motifs live.
Super conservative list of safe mods
I have pulled together the bases in the ribosome, that we do not yet have a function like an RNA motif or a ribosomal protein binding site pinned to. It is an idea list of good places to start mutating.
However when I wanted to demonstrate how we could highlight all these “safe” bases ingame, I ran into trouble.
The new game markers are different. I can't mark the bases the way I did before. I also tried to remove all marks before running the first script. If some bases are highlighted already, I use this script to clear them. (Mark Mutations (v0.7) (Eli's copy) https://eternagame.org/web/script/9597819/
Anyway, I give you the base lists. The ability to highlight the bases in the lab puzzles I can’t give.
Very conservative list of safe mods (330 bases)
Super conservative list of safe mods (251 bases)
Very conservative list of safe mods (689 bases)
Super conservative list of safe mods (454 bases)
Here are the data sheets with the details for the individual bases in the ribosome.
Still Missing Ribosome Partners
We still have around 1/5 of the actors interacting with the ribosome not yet in the spreadsheet. However the list has gotten shorter. Antje has helped me with data for some of them.
Here are the yet missing ones.
Missing ribosomal proteins
IF1, IF2, IF3, RF2
Where they typically touch the rRNA
Where it typically touch the rRNA
Here is what I did:
1) Copy the base numbers out of the Very conservative list of safe mods column in the 5S Motif spreadsheet:
2) Open and reset the 5S lab puzzle so it is unmutated.
3) Call the [Booster] Report/Mutate/Mark/Unmark Bases (v1.1) (Eli's copy) (0)
4) Insert the base numbers in the booster
Very conservative list of safe mods (28 bases)
Super conservative list of safe mods (23 bases)
Now there are a lot fewer bases to concentrate on. This is what the data sheet potentially can do for the 16S and 23S puzzles too.
New experimental data for the pilot round’s 16S and 23S designs
Almost four weeks into the first round of the ribosome challenge, I have a third set of iSAT data for the pilot round designs for you. This time, I challenged the designs to fold and assemble in iSAT under RNA folding stress conditions: low magnesium (Mg) and low temperature.
Aim of the ribosome challenge
Before I explain the results, let me first explain what we can learn from these data. The aim of the ribosome challenge is to design a stabilized ribosome which folds more easily and is much more stable than the wild-type ribosome (for more information, please look back to the previous blog post Andy Watkins and I wrote mid-November, https://eternagame.org/web/blog/9618257/). This means that an EteRNA design has to beat the wild-type ribosome under standard iSAT conditions, which have been carefully optimized to maximize ribosome folding and consequent protein production. In principle this should be possible, because the wild-type ribosomal RNAs when synthesized in iSAT does not fold, assemble, and perform ideally – evolution has tested and selected ribosomal RNA sequences in living cells and not in the test tube! Hence, if life is robust, the ribosome ought to be evolvable, and thus optimizable, for this specific new environment. And therefore, it should be possible for EteRNA players to come up with ribosomal RNA variants that more easily and stably fold and assemble into better performing ribosomes in the test tube than the wild-type does.
What can we learn from testing EteRNA ribosome designs under folding stress conditions?
In my previous iSAT experiments, I compared the designs to wild-type ribosomes under standard iSAT conditions with optimal Mg and temperature regimes. In order to get more insights into the design’s folding behavior, I tested the designs in iSAT under “no PEG and no extra DTT” conditions. As a reminder, DTT (dithiothreitol) is an antioxidant (prevents oxidative damage) which helps ribosomal subunit synthesis and assembly. PEG is a crowding agent creating a more cell-like environment which might counteract that assembled ribosomes fall apart. Thus, omitting PEG and DTT placed the ribosomes under a bit of extra stress, where conditions were not optimal for folding and function.
How do magnesium and temperature affect RNA folding?
In order to get more insights into the folding behavior of the pilot round’s 16S and 23S designs, I tested the designs in iSAT in low Mg and low temperature regimes. Mg generally helps RNA folding through two mechanisms. First, there is a general electrostatic effect, since the RNA backbone is a polyanion (multiple negative charges) and Mg is positively charged. Second, there can be specific interaction geometries that are particularly favorable when the Mg includes phosphate oxygens in its octahedral coordinate geometry. (Some of these terms might be unfamiliar, and that’s okay. Basically, Mg tends to coordinate with six electronegative atoms, especially oxygen or nitrogen, and the best geometry for that is “octahedral” – like three perpendicular x,y,z coordinate axes.) One classic example of an RNA structure that is stabilized by a specific Mg binding interaction is the HCV IRES Domain IIa (PDB ID: 2PN3). So, lowering the Mg concentration asks a design to fold into the target structure with less “help” from Mg.
Using a lower temperature, in contrast, has multiple effects. First, it slows down RNA synthesis, which should give the newly synthesized RNA more time to fold. Specifically, folding may be more local and co-transcriptional – that is, if the first 20 nucleotides (A, U, C, G) of an RNA can form a structure, and the first 30 nucleotides of an RNA can form a more stable structure, then you might be slightly more likely to end up with the first, less stable structure because synthesis will be slower relative to folding. This is a simplified picture, of course – I’m speaking in absolute terms rather than probabilities – but I hope it’s illustrative. Relatedly, since lower temperature means there is less thermal energy available, RNA folds have a harder time resampling themselves to “fix” suboptimal structures. If you have a steep energy landscape around a mis-fold, where, say, any set of three base pairing changes are very destabilizing, but there is a set of six base pairing changes that is very stabilizing, then the RNA will have an easy time finding those six changes at high temperature and a very hard time at low temperature.
Of course, in iSAT, both Mg and temperature changes also affect the extract’s metabolism needed for RNA synthesis and folding, ribosome assembling and protein production, but this “background influence” should be the same, independent from the tested designs. So, we really only have to think about the influence of Mg and temperature on rRNA folding, in particular the thermodynamic and kinetic effects above. Therefore, the performance of a design under these challenging conditions provides additional information about a design’s folding success and stability.
Experimental data for the 16S designs
Now, the data: First, the results for the 16S designs. On the top you can see how the designs performed compared to the wild-type under standard iSAT conditions with optimal temperature and Mg (37 °C and 10 mM Mg) in this experiment. Similar to my previous data presentations, on the left you find the GFP production over time and in the middle and on the right the final amount of GFP made (maxGFP). The second row shows how the designs performed at optimal temperature but only 5 mM Mg, and the last row when in addition to the low Mg regime also the reaction temperature was reduced to 30 °C. All data are normalized to the maxGFP of the wild-type under optimal temperature and Mg regime. In addition, I adjusted the y-axis of the plots individually and did not keep it the same for all.
As you can see, lowering the Mg concentration to 5 mM reduced GFP production by the wild-type and all designs – some were more affected than others. Interestingly, lowering also the reaction temperature to 30 °C was a little beneficial, and also here, some designs were more affected than others.
You probably noticed that in this experiment the designs behaved a bit worse compared to the wild-type and that there is no detector saturation indicated anymore. The main difference to before is that in this new set of experiments I used a lower reaction volume and a different machine for GFP fluorescence detection. This might have affected the designs performance compared to the wild-type. Since I don’t know the reason for the discrepancy yet, and don’t have more data, I suggest considering the old and the new data as valid.
Experimental data for the 23S designs
Now the 23S designs: All data are presented as for the 16S designs. Also here, the stress conditions influence some design more than others, and the results for the standard condition vary a bit compared to the previous experiment.
Coming next, I will analyze the time course data on the left a bit further. These additional might provide more insights into the effects of the stress conditions. So, you can expect additional data soon.
Antje (Antje Krüger from the Jewett lab at Northwestern University)
Speed Evolution by breaking IUPAC
I think that we should not take the IUPAC violations all too serious. I think we can get around breaking some of them. Here is why.
Last november Andy and Antje explained how evolution works. Here an excerpt:
"We like to talk about how sequence covariation can be used to support particular secondary structures. For example, if bases 1 and 10 both vary a lot, but they are always complementary to each other, we have reason to suspect that they are base paired to each other. But this sort of coordinated change isn’t how evolution actually samples sequences over time. Mutation rates in E. coli, depending on who you ask, are between one and two mutations per 10,000 generations per genome, and any particular nucleotide will mutate once per two hundred million generations. That means that the chance that a particular nucleotide, as well as its base pairing partner, will mutate in the same generation are extremely low. Since there have been many individual E. coli, it has surely happened before, but on average probably about once per E. coli lineage. Really, then, the way that covariation works is that a single nucleotide mutates in a way that does not totally break the fold, and then later, its base pairing partner makes a compensatory mutation. Evolution can’t sample big, structural changes; it traces a gradual path of changes that aren’t fatal."
From this post: https://eternagame.org/web/blog/9618257/
All mutations in single base areas have been tested plenty by evolution but not so much the base pairings.
However changing a base pair radically, takes either a double mutation, which is extremely unlikely to happen in one go. So base pairs will tend to change in several steps. A base pair may change like this. A GC may get one mutation that become a GU. This GU may later become an AU.
However what has not thoroughly been tested by evolution is a GC becoming a CG or UA in one go. Basically we can get more bang for our mutational buck, if we specifically target every base pair all over the ribosome that is not involved in RNA motifs or is touched by ribosomal proteins and double mutate it.
The further away from the original base pair, potentially the better, as for what is most likely to not have been tested. Which means flips.
Normally when a single point mutation happens, a G becomes an A (keeping it in the purine family) or a C becomes a U (keeping it in the pyrimidine family). We want to stray as far from normal as possible, to test what evolution has not been testing. So this means we purposefully make G into C's, C's into G's, A's into U's, U's into A's.
This kind of change is also likely to cause some structural changes. So not every such double mutant is likely to be for the better.
Hypothesis: Evolution by base-pair substitution
What I wish to particularly highlight from Andy and Antje’s post:
"Mutation rates in E. coli, depending on who you ask, are between one and two mutations per 10,000 generations per genome, and any particular nucleotide will mutate once per two hundred million generations. That means that the chance that a particular nucleotide, as well as its base pairing partner, will mutate in the same generation are extremely low."
My hypothesis is that by systematically double mutating any base pair that isn’t involved in critical structure, we will exponentially increases our chance of testing something that evolution has not yet gotten around to.
Join the lab experiment
Feel free to join the experiment.
Proposed hashtags for the design title:
#Substitution #IUPAC violation
What to do:
Pick a base pair that does not contain an RNA motif
Or is touched by a ribosomal protein.
Spreadsheets with positions of RNA motifs etc.
Make a double mutation to a base pair. It is no problem if it violates IUPAC. The more, the merrier.
Neck mutations: Far range stabilization
I have an idea I think we can use in combination with the base-pair substitution idea above.
Back in the static RNA labs I was testing two almost identical designs, with just one difference. Only difference was a flipped AU base pair in the neck.
A neck is the outermost stem that is folded together of two far away strands. For further explanation see Neck definition
Both designs were winners but had different score. But they were within error according to Rhiju.
However the reason why I'm particularly interested in it in related to substitution of basepairs is the following.
Watch the blue in the stems between the two designs. The blue means that this area is stable - not accessible to chemical probes. (The data is shown in full blue and full yellow and not showing nuances. For reading SHAPE data see this post Intro to SHAPE data)
Concrete example of base-pair substitution
Are the blue area the same places in both designs? Eg watch the stem with the triloop.
What I think is interesting is that it is not the same stem areas that are deemed stable in both designs. Despite there being only one base pair flip in the neck that is different between the two designs.
So by flipping a pair in the neck one can change the stability in an area far away
I think this is yet another way to gain extra mutational power, for just one base-pair substitution. We can potentially have long range effects and change stability far away even without touching this area directly.
Background post: Different types of necks and their effects on the main design
If you are up for testing this too in lab, just add #neck in your lab design title.