Shortly after the Wired article about Eterna, a biochem student/scientist paid our game a visit. I asked him about his background. He said he had taken a biochem class with Harry Noller, who had studied the Ribosome. He then showed me a picture that looked like nothing I knew of from in game. The closing basepairs in the multiloops wasn’t all following the GC-orientation rule, though many were. And they wasn't even all of them GC-pairs. The loops were odd too. Here is the exact picture he showed me.
It made me start thinking about why natural RNA looked so different from ours. Today a thought popped up in my head. What if the Branches lab instead looked like this.
I suspect both shapes will not be solvable with as many same turning GC-pairs in the multiloop or following theG-pattern in the end loops, as the winning labs did for the original shape.
Cloud lab 6 - Cross follows both the orientation of GC-pairs pattern in multiloops and the G-pattern almost perfectly.
Example of a design that follow the G-pattern 100%.
Notice the overall tendency of the winners in this lab to do the same.
But in the branches there are exceptions, both to multiloop pattern and G-pattern.
Outliers marked with stars.
If you look at the branches lab, it has repeat structures. 3 multiloops of almost similar size. Two twin branches. Whereas in the cross lab, there is only one multiloop. It kinds of makes sense there is more variation and straying from the main pattern, than in the simpler Cloud lab 6 - Cross.
What if the really big RNA’s looks like they do, as a means to avoid too much repetitious pattern? What if the numbers of element repeats matter? The bigger the structure, the more similar elements there will be. More multiloops and more stems of same size. How to avoid those from mispairing?
I think what I’m saying is that even when a certain pattern is superior (the most prevalent in winning designs) for an element in a small puzzle and will overall work well when used, that if something is repeated enough times, it can cause misfolds. As several Eterna players before me have noticed, design crave variations.
I can see that if the element is repeated many times in a small design, the elements are starting to stray from the usually most stable pattern. I see that 1 nt loops are fond of having double C’s at bottom of the 1nt bulge, but if there are many 1 nt bulges in a puzzle, more varied solving pattern for the bulge is often favored. So it seems that sometimes using a less stable pattern will then be superior to having the overall best pattern repeated too many times. The superior pattern can rule, but only until too much pattern repetition becomes a RNA structure killer.
While I was working on this post, Janelle threw a line in the chat. I think she was actually talking about lab details as she was giving a fine lecture on lab technicalities. But I think it sums up my conclusion very fine.
The smaller the RNA, the better the prediction
Let me change it to:
The smaller the RNA design, the easier rule prediction.
I think the small lab puzzle sizes we are working with will help us find strong overall patterns. If this is correct, I think challenge for us will be on when and how to best stray from the underlying rules, when the size of bigger lab puzzle later will forces us to. I think we will have to find a new layer of rules for the big labs.
I think the reason why more same turning basepairs are tolerated now, comes down to the simple fact that we have longer sequences now. They are simply more needed. As having all basepairs twisted, would also create unwanted complementarity. I actually think I understand why huge RNA wants so many same turning pairs of GC’s. It is simply to help avoid complementarity. And why the bots did bad in lab was, was because they had way too many double or longer same turning pairs, in rather short RNA sequences, this way creating tons of ways for sections in the RNA to be complementary to themselves. They simply had too many options to pair up otherwise with themselves. If any kind of pattern that is complementary to another pattern, and both is repeated with too high frequency and they have a rather strong pull. I mean GC-base pairing or watson crick base pairing over non complementary binding, then the likelihood a mispairing will occur, is simply bigger. In particular if that pattern is not locked safely away inside longer stems in a structure that is made stable at closing base pairs spot and next of pair.
I considered double same turning GC-pairs particular risky at closing base pair spot, multiloops in particular. But now they pass if not used to heavily. Double AU base pairs stills isn’t welcome at that spot, but I know they are used too in real big natural RNA too, that are much longer than our sequences. However also double same turning AU’s have shown to be very useful in particular in middle of longer stems. Even double GU’s are allowed that way sometimes.
So now double same turning quads have shown themselves to be more legal and even called for. In particular in designs with longer stems, where they also went fine in many Classic Eterna labs. Same turning GC-pairs were also called for back in classic lab in pressured designs with many short stems. Something I was wondering about. I think it comes down to something as simple as variation helps break repetition. Since the stems were short, all of them could not use the more optimal patterns for mid length stems, so the consequence was double same turning pairs for more of these short stems.
So I think what I stated in My Strategy Guide for Lab, turns out to be even truer than what I expected:
“Bad pattern depends partly on location. What is bad pattern style is not necessarily bad pattern in a longer and more tolerant stems.
Every pattern is a bad pattern if repeated enough times in the same design. Every pattern is bad if put in the wrong place.”
It also now make more sense to me that the bigger multiloops in this monster huge RNA does not follow orientation as I expected from the more middle sized multiloop in our much smaller RNA designs. The tendency for the bigger multiloops was for them to care much less about what to us was the more usual orientation from what we know and rather go mixed or reverse orientation. I think there are more rules for discover for these big multiloops.
I don’t know, but I have an idea that the scientist bots take their patterns not just from energy calculations, but also from natural occurring RNA. Could it be that the bots have taken these patterns from RNA’s much different in size than those we are handling or a mixture of size? I think it matters real much for reuse of patterns, that one take them from a similar sized design, similar length of stem and positioning of that stem. I think an stem, despite being same length, takes a different solving, depending on if it is adjacent, placed on a more relaxed multiloop or tuck onto an internal loop. I think it matters what is between at both ends.
Or are there player puzzles you’re more interested in? For example, there may be some that were unsolvable by bots even 'in silico’. What if players could not only solve them in silico but also in vitro?
Do RNA has domains like proteins do?
I had read about proteins having domains and this was what lead me to wonder. So I asked Rhiju, if RNA has domains, like proteins do.
Rhiju: Hmm interesting. I think it’s an open question how to prevent different segments of an RNA chain from base pairing to each other, although soon we’ll want to have rules for this separability when we try to compose 3D structures from designed domains.
In proteins, domains are sections that can have a function on their own. Its kind of like a autonomous working unit.
From what I understand in proteins, domains can have identical structures with identical sequence and multiple of them and they are working fine together and no misfolding. So I think protein domains are different to RNA domains in a fundamental way.
What is a domain in RNA
When it comes to RNA and domains, I think of domains kind of like a section.
Elements are not a section, a section is made of elements. I think of a section typically as separated by gap bases or it could be a branch coming off from a multiloop.
For static designs I think of a structure like a multiloop and its attached stems could be a section.
However I think for switches a RNA domain can be as small as two stems and an aptamer.
Eg in one of the Top notch designs, the switching seemed to occur mainly in the small stem after the aptamer. However I can’t know for sure how much of the structure is unnecessary for allowing the switch in the aptamer, but I imagine if just the static end of the aptamer is long and stable enough, perhaps a bit longer than in this actual design, then it should be possible to get this mini section to switch on its own. So that could be an example on an domain in RNA.
I have earlier been kind of angry on designs with adjacent stems in multiloops. Because in the beginning I saw them turn up mainly in designs that didn’t have many winners. I originally thought it necessary to have sort a base or more between the different elements as to keep them apart from each other. But I found that it were mainly designs with adjacent multiloops and also very short stems that caused the biggest problems.
I think length of stem and varying stem length, is one of the keys to keep domains and sections in a RNA design, separate from each other. As is varying of loop bases and gap distance. (unpaired bases).
However I still think that having a bit of single bases between different elements can be a help for stems not too easily start interfering with each other. While I’m also aware that adjacent stems via coaxial stacking can actually help stabilize each other. Which leads me to wonder: Do coaxial stacking only happen in longer stems?
Repeat structure - and their effect on the RNA fold
I think RNA does not have domains in the same way as proteins, with both identical sequence and structure. We have had labs with repeat structures. The branches lab was one of them.
While when there were a solution for a section which was absolutely strongest and best, it wouldn’t stay strongest and best if repeated in multiple identical sections.
What I have learned over time, is that one can’t keep repeating the best sequence, neither on element basis, or section basis, to solve identical elements or sections. If you do that you won’t get the same structure. I recall how Dimension9 kindly pointed out in one of my first lab designs, that my design was almost too symmetrical. That the stems were too similar.
This varying of sequence in identical length stems is something Mat has been practicing from start and has pointed out in his lab designing strategy. If there were more of the same element, you need to vary your solve and pick next best option for solving the structure too, when there are more than one of the same element. This is why he has a whole list of next element to pick when the best option has already been used.
Repeat sequences - and their effect on the RNA fold
So imagine this. You have an unfilled RNA string of A’s. Then you fill in two identical letter sequences in that would fold into the same secondary structure if they were alone. Now will they fold into the identical structures their sequence carried in them? Ok, you can often get away with two, without misfold and with identical and intended secondary structure forming.
It also depend on the repeat sequences length, their intended structure and the number number of repeats. If they are fairly short like 4-5 bases and there are 2-5 identical of them, and they are close in space, you can abuse their similarity to get a switch happen. :)
Repeats and switches
Preferably I want these short 4-5 base repeat switch sequences backed up in a tiny corner of the design (in the obligate switching area or the unpaired bases around it) - having the main part of the design static and non moving - so they have little other choice than to jump between their intended target/s. So I think playing a game of uncomplimentary in the static part of the design, to the switch repeat sequences, is a successful way of avoiding the switching part of the design interfering with the part of the design that is meant to be static.
Its mainly depending on the repeats size and stem lengths and a bit on their position in relation to each other. If they are close in sequence misfolds happens easier, similar if they are close in 3D space misfolds happens easier too.
Repeats and stem length
I bet if stem lengths are generally short, repeats will start interfering much faster - even with just two identical sequences. But if stems are long enough or gaps between them are long enough, you may still have them folding into their intended identical structures.
The branches lab has two identical sub branches. A strange pattern hit through the whole lab, for the second of the branches. Something which were causing me quite a bit of wondering. I did not understand why the second branch would need a pattern I would normally think of as inferior.
This design has repeat structure.
Both these designs and lab, had many of the winner design use this normally less secure pattern - as a way to create variance in the sequence and avoid misfolds.
So the need to vary sequence of identical structures to avoid the sequences pair up, is what I now think is the source of the double AU’s in the second branch. A pattern that is allowed in stems regularly, but is more risky playing next to a closing base pair, as it is sometimes breaking open. I counted the solve for the second branch weaker than that of the first branch.
None of the designs in the Branches lab designs have identical sequence in the two structural identical branch sections. However I bet if the experiment were run again with the winning designs and just one of the branches were mirrored onto the other (and vice versa), then the main part of the designs would not do nearly as well as their origin.
So if the sequence is in the structure, it depends a lot on what that sequence is also in sequence with.
Here is an example on repeat elements. Two 4 base pair stems with identical closings, but with different middle, and most of the winners used a pattern, just like the branches lab, at one spot which I considered less good. Notice that also the two other small 3 base pair stems, which also are repeat elements are solved differently.
Sum up on protein versus RNA domain
Protein domains - Both structure and sequence are identical for multiple identical domains
RNA domains - Sequence starts differ for multiple identical structures. In RNA two identical target structures generally means two different sequences. Two identical sequences for structures, may also not yield two identical structures.
Two longer repeat sequences, normally will not mean two identical structures. Three repeat sequences and you won’t recognize the original structures you thought were going to form.
Two repeat structures means two identical structures with different sequences (if the sequences do not differ, you are in risk of misfolds.) Three repeat structures and the sequences definitely have to differ some.
If you use many more repeat structures, you simply don’t have enough strong solutions to solve the individual sections or elements - plus you gather repeat in sequence, due to similarity in sequence between the optimal solves for an element or section.
For RNA the structure is not only the sequence as it is for proteins. The structure is the sum of sequence + how big its and its surrounding sequences potential for fitting better elsewhere is.
That factor for misfolding is a thing that to some extent can be controlled at least in smaller lab designs, by different things like, not making stem too short, vary stem length and not keep too much repeat structure. So simply by raising length of stems and keeping stem lengths and gaps length varied, you also build in some security against misfolding. Controlling ratio of GC, AU and GU’s are a way of doing the same.
If you gather many repeat structures - you in practice enforce close to identical repeat sequences and result is misfolds. Because due to similarity in sequences, they are also compatible with each other. If you have two repeat structures and you space them well enough and the stems are not short, you may get away with it. But adding in number, you are asking for trouble. And this is why the bots regularly comes short on our puzzles, as our puzzles often harbor repeat structures and same length of stems to a degree which I haven’t seen in any natural RNA:
When repeat structure and repeat sequences are in plentiful - you will get the misfold from hell. :)
If you use repeat sequence, you don’t get repeat structures
If you use repeat structures, you don’t get repeat sequence.
Which leads to: If you both use repeat structures and repeat sequence, you are doomed. :)
At least if you are not aiming for a ginormous beautiful misfold. :)
Advice on making RNA domains
To make a RNA domain and keep it intact - it will help you to ensure that it is not identical to anything else in the RNA design. Make it different in sequence and structure. If you want two identical domains - make sure both their sequence and structure differ slightly. Same goes for elements - at least if they are close.
Why vary both structure and sequence for similar domains?
One may get away with identical structures, if one vary the sequence a bit. One may get away with more sequence similarity, if one varies the structures a bit. (Provided the two domains are not large and
However tilting both factors just a notch and it gives a much
stronger hand against misfolding. Structurally identical enough to perform same
function, but different enough in sequence to not misfold. You stack the bases in your favor and raise the chances of getting a good fold. :)
RNA love one chicken foot, it like two chicken feet, but 3 chicken feet is a monster.
Rhiju, I was reading one of your papers that is related to ribosomes: RNA regulons in Hox 59 UTRs confer ribosome specificity to gene regulation. (Open access, check here, search for the title and choose paper)
I find this bit of the paper particularly fascinating:
"To date, only a small class of viral IRES elements have been shown to interact with both the large and small ribosome subunits to form a translationally competent 80S ribosome30,31. However, a biotinylated full-length Hoxa9 59 UTR, as well as the minimal IRES element contained within nt 944–1,266, are able to pull down ribosomal proteins from both the large and small subunits, including RPL38 (Fig. 2c, d). The full-length 59 UTR also pulls down both 28S and 18S rRNAs (Fig. 2e), suggesting that the 80S ribosome is able to form on the uncapped Hoxa9 59 UTR."
Translated: Basically they found out that some messengerRNA (mRNA) had two ways to get translated.
Normally mRNA is capped, which is a way for the ribosome to check that it is translating mRNA from the cell and not some opportunistic viral RNA. Viruses have found an alternative way to overcome it. Some viruses have an Ires element which can help pull the two ribosomal sub units together and make the ribosome assemble, without the usual starting machinery that is normally necessary for translating the cells own mRNA’s.
But some mRNA’s also had that special Ires code that viruses used + a cap and despite having the correct cap, they got translated by the viral element when needing to grow body parts like skeleton.
The Ires element reminded me of this sly fellow. An octopus dragging together two coconut shells to hide itself - displaying a surprise use of tools. Here is one octopus that has perfected the art. :)
Now I think I understand why the ribosome needs to be in sub units. Proteins don’t seem to dig doing bigger switches in structure unless there is something like a pH change or a ligand binding which adds energy. It would unpractical and impossible for the cell to change pH, each time a new peptide bond was to be made. :)
Hmm, okay, proteins are mainly surrounding each of the RNA subunits, and the core is RNA. But then again, I don't think that RNA fancies doing that big switches either. So still same result. So what proteins especially, but also RNA can't achieve when alone, they can achieve together. Now it also makes sense that the core is RNA. As it is the better at performing switching when in a much smaller version, compared to protein and since the early world of life is thought to have been an RNA one.
Earlier I have kind of been thinking about most of the ribosomal RNA as space filling RNA and protein strings attached to keep the whole thing together. I only imagined a few sections of the RNA to actually have a specific function, either by having a shape that allowed space for holding holding of a tRNA codon. However what I am starting to come to a realization off, from the paper Rhiju linked, that the ribosome has been build in layers and each layer tends to add a new function on top of the existing ones.
However I do think that some of the small hairpins on the multiloops in the multiloop highways are really just space fillers and more are there for making sure the multiloops got identity variation by having different stem count and length, so they don’t misfold.
Now I wonder if the asymmetric nature of RNA is part of the explanation on why RNA is harder to crystallize compared to proteins?
Proteins are far more ordered and compact in their structure. Proteins form beta sheets, where side chains line up with each other in a repeated and regular manner. Even their alpha helices have ordered structures in themselves and even more when more of them line up side by side.
On a higher level, proteins also often use repeating domains/units. All of this adds up to higher orderliness. Order means denser structure and higher crystallinity, meaning it should be easier to get the structure by X-ray.
RNA rarely have bigger stem regions line up with each other. Though RNA can have coaxial stacking, where two stems line up with each other which is an adder of stability and energy bonus.
Proteins also generally contains far more symmetry compared to RNA - that's except for some very symmetric RNA switches ;). Although there are some symmetry to some higher order RNA structures, but not if one zoom in and look at the details.
RNA seems fractal in an asymmetric way. Needing variations on all levels, in particular the bigger it gets.
I suspect the asymmetric nature of RNA has a good deal to do with why it is harder to obtain x-ray structures from RNA compared to protein.
It all makes sense now. :)
Now I also wonder if the RNA designs with coaxial stacking are easier to crystallize, than similar sized RNA designs without?