Huge RNA

  • 2
  • Article
  • Updated 3 years ago
Huge Natural RNA

Shortly after the Wired article about Eterna, a biochem student/scientist paid our game a visit. I asked him about his background. He said he had taken a biochem class with Harry Noller, who had studied the Ribosome. He then showed me a picture that looked like nothing I knew of from in game. The closing basepairs in the multiloops wasn’t all following the GC-orientation rule, though many were. And they wasn't even all of them GC-pairs. The loops were odd too. Here is the exact picture he showed me.

It made me start thinking about why natural RNA looked so different from ours. Today a thought popped up in my head. What if the Branches lab instead looked like this.

Or this?

I suspect both shapes will not be solvable with as many same turning GC-pairs in the multiloop or following theG-pattern in the end loops, as the winning labs did for the original shape.

Cloud lab 6 - Cross follows both the orientation of GC-pairs pattern in multiloops and the G-pattern almost perfectly.

Example of a design that follow the G-pattern 100%.

Notice the overall tendency of the winners in this lab to do the same.

But in the branches there are exceptions, both to multiloop pattern and G-pattern.

Outliers marked with stars.

If you look at the branches lab, it has repeat structures. 3 multiloops of almost similar size. Two twin branches. Whereas in the cross lab, there is only one multiloop. It kinds of makes sense there is more variation and straying from the main pattern, than in the simpler Cloud lab 6 - Cross.

What if the really big RNA’s looks like they do, as a means to avoid too much repetitious pattern? What if the numbers of element repeats matter? The bigger the structure, the more similar elements there will be. More multiloops and more stems of same size. How to avoid those from mispairing?

I think what I’m saying is that even when a certain pattern is superior (the most prevalent in winning designs) for an element in a small puzzle and will overall work well when used, that if something is repeated enough times, it can cause misfolds. As several Eterna players before me have noticed, design crave variations.

I can see that if the element is repeated many times in a small design, the elements are starting to stray from the usually most stable pattern. I see that 1 nt loops are fond of having double C’s at bottom of the 1nt bulge, but if there are many 1 nt bulges in a puzzle, more varied solving pattern for the bulge is often favored. So it seems that sometimes using a less stable pattern will then be superior to having the overall best pattern repeated too many times. The superior pattern can rule, but only until too much pattern repetition becomes a RNA structure killer.

While I was working on this post, Janelle threw a line in the chat. I think she was actually talking about lab details as she was giving a fine lecture on lab technicalities. But I think it sums up my conclusion very fine.

The smaller the RNA, the better the prediction

Let me change it to:

The smaller the RNA design, the easier rule prediction.

I think the small lab puzzle sizes we are working with will help us find strong overall patterns. If this is correct, I think challenge for us will be on when and how to best stray from the underlying rules, when the size of bigger lab puzzle later will forces us to. I think we will have to find a new layer of rules for the big labs.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Posted 6 years ago

  • 2
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Mat and I discussed double same turning GC-pairs. Neither of us were too happy about them in Classic Eterna labs. Mat had them as last option in his lab design strategy (non preferred sequencing, page 3), whereas I made strategies banning them. Now we are both using them in labs.

I think the reason why more same turning basepairs are tolerated now, comes down to the simple fact that we have longer sequences now. They are simply more needed. As having all basepairs twisted, would also create unwanted complementarity. I actually think I understand why huge RNA wants so many same turning pairs of GC’s. It is simply to help avoid complementarity. And why the bots did bad in lab was, was because they had way too many double or longer same turning pairs, in rather short RNA sequences, this way creating tons of ways for sections in the RNA to be complementary to themselves. They simply had too many options to pair up otherwise with themselves. If any kind of pattern that is complementary to another pattern, and both is repeated with too high frequency and they have a rather strong pull. I mean GC-base pairing or watson crick base pairing over non complementary binding, then the likelihood a mispairing will occur, is simply bigger. In particular if that pattern is not locked safely away inside longer stems in a structure that is made stable at closing base pairs spot and next of pair.

I considered double same turning GC-pairs particular risky at closing base pair spot, multiloops in particular. But now they pass if not used to heavily. Double AU base pairs stills isn’t welcome at that spot, but I know they are used too in real big natural RNA too, that are much longer than our sequences. However also double same turning AU’s have shown to be very useful in particular in middle of longer stems. Even double GU’s are allowed that way sometimes.

So now double same turning quads have shown themselves to be more legal and even called for. In particular in designs with longer stems, where they also went fine in many Classic Eterna labs. Same turning GC-pairs were also called for back in classic lab in pressured designs with many short stems. Something I was wondering about. I think it comes down to something as simple as variation helps break repetition. Since the stems were short, all of them could not use the more optimal patterns for mid length stems, so the consequence was double same turning pairs for more of these short stems.

So I think what I stated in My Strategy Guide for Lab, turns out to be even truer than what I expected:

“Bad pattern depends partly on location. What is bad pattern style is not necessarily bad pattern in a longer and more tolerant stems.

Every pattern is a bad pattern if repeated enough times in the same design. Every pattern is bad if put in the wrong place.”

It also now make more sense to me that the bigger multiloops in this monster huge RNA does not follow orientation as I expected from the more middle sized multiloop in our much smaller RNA designs. The tendency for the bigger multiloops was for them to care much less about what to us was the more usual orientation from what we know and rather go mixed or reverse orientation. I think there are more rules for discover for these big multiloops.

I don’t know, but I have an idea that the scientist bots take their patterns not just from energy calculations, but also from natural occurring RNA. Could it be that the bots have taken these patterns from RNA’s much different in size than those we are handling or a mixture of size? I think it matters real much for reuse of patterns, that one take them from a similar sized design, similar length of stem and positioning of that stem. I think an stem, despite being same length, takes a different solving, depending on if it is adjacent, placed on a more relaxed multiloop or tuck onto an internal loop. I think it matters what is between at both ends.
Photo of eternacac


  • 274 Posts
  • 19 Reply Likes
This is why I always go back and look at very large natural RNA.

Also, bond stability is not the same a functional utility to the organism. So sometimes weaker bonds (lower stability) are better for the intended function, IMO.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Hi Chris!

I'm happy to hear that you too like to look at very large natural RNA. For me it was a revelation - though at first a very confusing one, because I only knew our smaller and synthetic RNA.

The lower stability bonds and weaker areas in natural RNA, is something that would result in less blue and more unstable SHAPE data for our RNA. If all the RNA is intended to do is to hold the structure, then deep dark blue SHAPE data is what to go for with stems. I think like you that there are spots and areas that will particular call for a less strong bind. Particularly when RNA has enzyme or switch like function.
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 122 Reply Likes
OK — thanks for the discussion. If you had to pick a long shape of 250 nts (and we could only make 8-16 designs), what would it be? This one?

Or are there player puzzles you’re more interested in? For example, there may be some that were unsolvable by bots even 'in silico’. What if players could not only solve them in silico but also in vitro?

Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 122 Reply Likes
By coincidence (or not), I've been immersed in the ribosome structure. One good read is a recent paper on how the structure might have evolved: 

Sounds like its time for Eterna to get into 3D fractal design...
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Sweet! :)

Thx for the paper. Will check it out.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

I read the paper and this is amazing!

“Here we show that rRNA growth occurs by a limited number of processes that include inserting a branch helix onto a preexisting trunk helix and elongation of a helix.”

Trunk helix seems to be slang for some kind of ear piercing. So really trunk helix means pierced stem, when it comes to RNA context. :)

Ribosomal growth at the tips of the ribosomal rna. So not only do ribosomal RNA display fractal nature. It even grows in a fractal manner. :)

Keeping the core the same but adding at the tips. (Just like with a fractal where the core stays the same, but the smaller parts at the tips grows visible when zooming in.)

It also looks like the additions generally happens somewhere in the middle of stem sections. (Figure 5). I can see the joke in that. Yeast ribosomal growth - budding off like yeast. ;)

I also love that evolution actually leaves fingerprints. :)

“rRNA expansions can leave distinctive atomic resolution fingerprints, which we call “insertion fingerprints.”

I couldn’t help myself but draw on one of the images. Figure 6B. It is extremely beautiful.

In the first one I added lines for the different phases. I started in the middle of the first phase and tried put it so it went through the middle distribution of that phase. I found it interesting that there was a fan pattern. With a even or growing? angle. It's like there was an angle shift for growing direction for each phase build on. The first two phases switch opposite to the others.

Here I tried with no starting center but just trying to put the line somewhere in the middle of the color distribution of each phase.

Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Extra fun fractal "fact" today.

Saw this image and I'm positive that squid tentacles are fractal. :)

While searching for more on fractal squids, I found this page:

Turns out bacteria can make fractals too. :)

While several of these fractals are far more regular and pure fractals, some of these fractals are also shifted in angle when making new sections, kind of like what I see on the phase colored image from the article Rhiju shared. I'm aware that 2D isn't the same as 3D, so things may look different in space with the angling. Still I find it interesting.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Machinelves sent me an awesome letter on how to make music of an RNA sequence. I enjoyed it so much, that I decided to share the fun. Here it is and with the same title as she gave it:

Musical RNA!


go to this website

click on the clipboard to paste in a sequence

then paste in the sequence, like this one from your latest lab design


then click on the paper airplane to get a URL of the song

you can also change the kind of tones by clicking on the music note

:D :D :D

Type a tone - in use on other types of RNA

Of cause I had to test this out on a part of a ribosome. :)

Here is the result:

The G’s are a very characteristic part of the melody and rhythm. Being there more frequent than the other letters, being close and mostly single + double. And having the highest pitch.

Here is how I did it. I searched the Protein Data Bank, I set polymer to RNA, searched for ribosomal, I choose e.coli and found a 23S ribosomal RNA. I checked under the sequence menu and opened fasta to download the sequence.

I later listened to long noncoding RNA too. It resembles a little, but holds less of an order. For the fun of it I also tried out messenger RNA. I have earlier noted that there were many repeat bases and less mixing of bases in messenger RNA and as I suspected it wasn’t as musical as I would have liked.



The G rhythm pattern wasn’t as clear in the non coding RNA and it disappeared in the messenger RNA. The ribosomal RNA even had some kind of rhythm to the C’s as well - probably not strange since many of them bind up with G’s in stem. It had an overall musical pattern to it, although it was far less ordered both in tone and rhythm patterns than usual music.

MessengerRNA’s sounds more repetitive. I could get used to Ribosomal melodies - they still have some kind of a pattern to them. :)

Proteins sounds cute. This one starts out beautiful:



Silly suggestion for lab functions - get an option to listen to the lab designs. Joke aside. ;)

Hint: the bad ones will sound way too repetitive. :)

Please do tell if you find other fun types of RNA, DNA or protein music. Even if they are ugly. :)

Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Do RNA has domains like proteins do?

I had read about proteins having domains and this was what lead me to wonder. So I asked Rhiju, if RNA has domains, like proteins do.

Rhiju: Hmm interesting. I think it’s an open question how to prevent different segments of an RNA chain from base pairing to each other, although soon we’ll want to have rules for this separability when we try to compose 3D structures from designed domains.

In proteins, domains are sections that can have a function on their own. Its kind of like a autonomous working unit.

From what I understand in proteins, domains can have identical structures with identical sequence and multiple of them and they are working fine together and no misfolding. So I think protein domains are different to RNA domains in a fundamental way.

What is a domain in RNA

When it comes to RNA and domains, I think of domains kind of like a section.

Elements are not a section, a section is made of elements. I think of a section typically as separated by gap bases or it could be a branch coming off from a multiloop.

For static designs I think of a structure like a multiloop and its attached stems could be a section.

However I think for switches a RNA domain can be as small as two stems and an aptamer.

Eg in one of the Top notch designs, the switching seemed to occur mainly in the small stem after the aptamer. However I can’t know for sure how much of the structure is unnecessary for allowing the switch in the aptamer, but I imagine if just the static end of the aptamer is long and stable enough, perhaps a bit longer than in this actual design, then it should be possible to get this mini section to switch on its own. So that could be an example on an domain in RNA.

I have earlier been kind of angry on designs with adjacent stems in multiloops. Because in the beginning I saw them turn up mainly in designs that didn’t have many winners. I originally thought it necessary to have sort a base or more between the different elements as to keep them apart from each other. But I found that it were mainly designs with adjacent multiloops and also very short stems that caused the biggest problems.

I think length of stem and varying stem length, is one of the keys to keep domains and sections in a RNA design, separate from each other. As is varying of loop bases and gap distance. (unpaired bases).

However I still think that having a bit of single bases between different elements can be a help for stems not too easily start interfering with each other. While I’m also aware that adjacent stems via coaxial stacking can actually help stabilize each other. Which leads me to wonder: Do coaxial stacking only happen in longer stems?

Repeat structure - and their effect on the RNA fold

I think RNA does not have domains in the same way as proteins, with both identical sequence and structure. We have had labs with repeat structures. The branches lab was one of them.

While when there were a solution for a section which was absolutely strongest and best, it wouldn’t stay strongest and best if repeated in multiple identical sections. 

What I have learned over time, is that one can’t keep repeating the best sequence, neither on element basis, or section basis, to solve identical elements or sections. If you do that you won’t get the same structure. I recall how Dimension9 kindly pointed out in one of my first lab designs, that my design was almost too symmetrical. That the stems were too similar.

This varying of sequence in identical length stems is something Mat has been practicing from start and has pointed out in his lab designing strategy. If there were more of the same element, you need to vary your solve and pick next best option for solving the structure too, when there are more than one of the same element. This is why he has a whole list of next element to pick when the best option has already been used.

Mat’s Lab Design Strategy

Repeat sequences - and their effect on the RNA fold

So imagine this. You have an unfilled RNA string of A’s. Then you fill in two identical letter sequences in that would fold into the same secondary structure if they were alone. Now will they fold into the identical structures their sequence carried in them? Ok, you can often get away with two, without misfold and with identical and intended secondary structure forming.

It also depend on the repeat sequences length, their intended structure and the number number of repeats. If they are fairly short like 4-5 bases and there are 2-5 identical of them, and they are close in space, you can abuse their similarity to get a switch happen. :)

Repeats and switches

Preferably I want these short 4-5 base repeat switch sequences backed up in a tiny corner of the design (in the obligate switching area or the unpaired bases around it) - having the main part of the design static and non moving - so they have little other choice than to jump between their intended target/s. So I think playing a game of uncomplimentary in the static part of the design, to the switch repeat sequences, is a successful way of avoiding the switching part of the design interfering with the part of the design that is meant to be static.

Its mainly depending on the repeats size and stem lengths and a bit on their position in relation to each other. If they are close in sequence misfolds happens easier, similar if they are close in 3D space misfolds happens easier too.

Repeats and stem length

I bet if stem lengths are generally short, repeats will start interfering much faster - even with just two identical sequences. But if stems are long enough or gaps between them are long enough, you may still have them folding into their intended identical structures.

Structure repeats

The branches lab has two identical sub branches. A strange pattern hit through the whole lab, for the second of the branches. Something which were causing me quite a bit of wondering. I did not understand why the second branch would need a pattern I would normally think of as inferior.

This design has repeat structure.

Both these designs and lab, had many of the winner design use this normally less secure pattern - as a way to create variance in the sequence and avoid misfolds.

So the need to vary sequence of identical structures to avoid the sequences pair up, is what I  now think is the source of the double AU’s in the second branch. A pattern that is allowed in stems regularly, but is more risky playing next to a closing base pair, as it is sometimes breaking open. I counted the solve for the second branch weaker than that of the first branch.

None of the designs in the Branches lab designs have identical sequence in the two structural identical branch sections. However I bet if the experiment were run again with the winning designs and just one of the branches were mirrored onto the other (and vice versa), then the main part of the designs would not do nearly as well as their origin.

So if the sequence is in the structure, it depends a lot on what that sequence is also in sequence with.

Elements repeats

Here is an example on repeat elements. Two 4 base pair stems with identical closings, but with different middle, and most of the winners used a pattern, just like the branches lab, at one spot which I considered less good. Notice that also the two other small 3 base pair stems, which also are repeat elements are solved differently.

Sum up on protein versus RNA domain

  • Protein domains - Both structure and sequence are identical for multiple identical domains

  • RNA domains - Sequence starts differ for multiple identical structures. In RNA two identical target structures generally means two different sequences. Two identical sequences for structures, may also not yield two identical structures.

Two longer repeat sequences, normally will not mean two identical structures. Three repeat sequences and you won’t recognize the original structures you thought were going to form.

Two repeat structures means two identical structures with different sequences (if the sequences do not differ, you are in risk of misfolds.) Three repeat structures and the sequences definitely have to differ some.

If you use many more repeat structures, you simply don’t have enough strong solutions to solve the individual sections or elements - plus you gather repeat in sequence, due to similarity in sequence between the optimal solves for an element or section.

For RNA the structure is not only the sequence as it is for proteins. The structure is the sum of sequence + how big its and its surrounding sequences potential for fitting better elsewhere is.

That factor for misfolding is a thing that to some extent can be controlled at least in smaller lab designs, by different things like, not making stem too short, vary stem length and not keep too much repeat structure. So simply by raising length of stems and keeping stem lengths and gaps length varied, you also build in some security against misfolding. Controlling ratio of GC, AU and GU’s are a way of doing the same.

If you gather many repeat structures - you in practice enforce close to identical repeat sequences and result is misfolds. Because due to similarity in sequences, they are also compatible with each other. If you have two repeat structures and you space them well enough and the stems are not short, you may get away with it. But adding in number, you are asking for trouble. And this is why the bots regularly comes short on our puzzles, as our puzzles often harbor repeat structures and same length of stems to a degree which I haven’t seen in any natural RNA:

When repeat structure and repeat sequences are in plentiful - you will get the misfold from hell. :)

If you use repeat sequence, you don’t get repeat structures

If you use repeat structures, you don’t get repeat sequence.

Which leads to: If you both use repeat structures and repeat sequence, you are doomed. :)

At least if you are not aiming for a ginormous beautiful misfold. :)

Advice on making RNA domains

To make a RNA domain and keep it intact - it will help you to ensure that it is not identical to anything else in the RNA design. Make it different in sequence and structure. If you want two identical domains - make sure both their sequence and structure differ slightly. Same goes for elements - at least if they are close.

Why vary both structure and sequence for similar domains?

One may get away with identical structures, if one vary the sequence a bit. One may get away with more sequence similarity, if one varies the structures a bit. (Provided the two domains are not large and pressured).

However tilting both factors just a notch and it gives a much stronger hand against misfolding. Structurally identical enough to perform same function, but different enough in sequence to not misfold. You stack the bases in your favor and raise the chances of getting a good fold. :)

RNA love one chicken foot, it like two chicken feet, but 3 chicken feet is a monster.

Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 122 Reply Likes
where do the strong (GC-reinforced) helices show in in 3D? Are they clustered, or do they form an extended 'skeleton' of strong bones?
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

I would say if you imagine that the small and large subunit is each their side of a rib cage, then I would imagine the reinforced areas to bend like ribs and embrace each their subunit.

I highlighted the GC reinforced areas in the simplified Thermus Thermophilus I made.

Reinforcement in 2D.

Now the thermophilus ribosome has more reinforcement than the e.coli ribosome, that I also drew on. Usually the reinforcement show up right around one or both sides of the multiloop, on the highway driving straight through the stems connecting the multiloops. I simply think the multiloops are the scaffold and the domains are simply addons.

I don’t know how to simulate a ribosome in 3D. As it is bigger than the 500 bases that Chimera allows. So I went and looked at images instead. I also have problems finding ribosome sequences for the ribosome of the organisms I want to look at. PDB only has only little RNA. :)

However I found an image that think shows what we are after.

So the ribs are bent and with a twist. Looks slightly like protein alpha helices actually :)

Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 122 Reply Likes
oh i meant just to take the 3D structure of the RNA and just color in helices that you've highlighted as GC-rich. I wonder if they are clustered in the core of the 23S rRNA, or are extended throughout like in your leaf diagrams!

Not sure how to do in Chimera, but this is pretty easy to do in Pymol if you want --  you can load in a ribosome structure like this one:

I was messing around with some coloring (for another reason), but its easy to color particular residues with a command like "color teal, resi 579-584+1256-1261+670-681+779-810". That colors the separate sets 579-584, 1256-1261, ... teal ; I think that happens to be one of the sets of helices you highlighted in DII !
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
I can't open the ribosome drop box file, I am not sure what program it takes. Pymol takes a license, where Chimera is free. But would be super cool to color things with commands. :)
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Ok, I think I understand your question better now. For the Thermus Thermophilus many of the stem stretches between the multiloops are extended gc rich. Whereas in e.coli the GC rich spots tends to cluster around the multiloops.

I found some awesome ribosome images in the paper you linked, just on top of the literature list.

I found something that is totally crazy. Large subunit from human ribosome. I have never seen so much GC base pairs in line in anything living. :)

There are also a huge amount of GC’s elsewhere too, not just in the long appendages. We are most complex organism in the image library, but not exactly an extremophile.

I found even more toys there. Ribovision is pretty cool. If I hover over any base it will tell me conservation. So I picked the Thermus Thermophilus that I’m already into and hovered over the GC heavy regions around the multiloops. The closer to the multiloop, the more conservation and there seems to be - makes sense. Also many of these GC heavy stretches seems generally well conserved. Sometimes there is a GC pair that’s flipped. The multiloop closing GC pairs generally had low entropy, which also makes sense.  

The high GC content in T. Thermophilus can probably be explained by its harsh environment and it having to tolerate higher temperature. Similar for the marismortui ribosome, as this comes from an archaea that lives in a saline environment.

Comparing melons with apples

The paper mentions that the large ribosomes have undergone more evolutionary changes than the small subunits.

“Bacterial and archaeal LSU rRNAs are composed entirely of the common core, with only subtle deviations from it. By contrast, eukaryotic LSU rRNAs are expanded beyond the common core. Sacccharomyces cerevisiae LSU rRNAs are around 650 nucleotides larger than the common core rRNA. Drosophila melanogaster LSU rRNAs are larger than those of S. cerevisiae by 524 nucleotides. Homo sapiens LSU rRNAs are larger than those of D. melanogaster by 1,149 nucleotides.”

“The differences in the small ribosomal subunit (SSU) components are more modest, with 69 additional nucleotides in the H. sapiens SSU rRNA over S. cerevisiae and 258 additional nucleotides in S. cerevisiae over E. coli (SI Appendix, Table S1).”

I realized that I have been comparing the e.coli small subunit with the large subunits of the other species. As such I have been comparing melons with apples, as there is quite a size difference between the small and large subunit.

So to fix that problem, I took a look at the large subunit of e.coli. This time I found it wasn’t so different from the thermophilus in GC content. The large subunit has a higher GC ratio around the central multiloops than the smaller subunit - the small subunit being the one I originally found the “leaf vein” pattern of GC’s. Now my curiosity was peaked.

The only thing different between the large and small subunits from what I can tell is size. Still the large and small subunit seems different in their clustering of GC’s around the multiloop centers.

This made me wonder, is this increased GC ratio size related? Let's say the bigger the RNA, the more GC content it needs at central spots to stick together?

I mean I have seen that the longer the stems become in our lab RNA, the less they need to have GC pairs. And the shorter the stems, the higher GC ratio they need. But GC content in relation with really big RNA, that I have had no feel for. So now I wonder if GC ratio by default changes with RNA size?

I wonder if there is a size component involved when it comes to GC ratio. After all the human ribosome bigger than all the others? Yup. If so that alone might explain why the heavier GC’ing is needed as to add stability to a rather big unit. Just like big elements in the periodic table have a harder time keeping their protons, neutrons and electrons together and sticking.

Are there any trends for GC ratio in relation to RNA size? If so this could very well explain why homo sapiens have its large subunit have so much GC. As it is by far the biggest of the ribosomal large subunits.

Now I wonder which natural RNA molecule is the biggest in the world? Without using proteins for reinforcement that is. :)

Fruitfly versus man

I’m also wondering about something else. It isn’t just size of the RNA of the organism and the bigger, the more GC. There is something else in play.

The large ribosomal subunit for fruit flies which is second most complex organism in that image library (drosophila melanogaster) on the other hand looks way more AU rich on average compared with human ribosomal large subunit. Also it is more AU rich than several of the smaller organisms large subunit. Not sure what to make of it. Just scratching my head. :)

I found a book (page 338) that says that man has 60% GC content versus only 40% for fruit fly (d. melanogaster).

So I don’t think the whole difference in GC ratio can be blamed on RNA size. So what is different between the fruit fly large subunit and the homosapiens? Now the large subunit for man has a lot more bases, but still fruitfly one has a lot more bases than several of the large subunits for smaller organism, that still has a higher GC ratio than the fruit fly large ribosomal subunit.

I have also been wondering about how much of the ribosomal RNA that could fold up on its own and have no misfolds without the protein. Cody says that ribosomal RNA is deeply interdependent on proteins for folding and that it is a kind of a chicken/egg problem as the RNA bit was thought to be once able to fold up on its own without the proteins.

Basically I am wondering how much of the sequence pattern we see in the ribosomal RNA, that are actually due to RNA folding and what bit is due to protein reinforcement or stress put on. Actually I would kind of have expected that the inside RNA part could relax a bit more, with all the protein around it. Again, I think the bigger ribosomes may have more protein content which I also could imagine having an effect the RNA.

I found a paper that deals with the formation of the ribosome. Turns out that MG2+ ion concentration is central for the inner core of the ribosome whereas proteins seems to take care of the outer region.

“Formation and evolution of the early PT center may have involved Mg2þ-mediated assembly of at least partially single-stranded RNA oligomers or polymers. As one moves from center to periphery, proteins appear to replace magnesium ions.”

“Mg2þ density is greatest in the core region and falls off with increasing distance from the origin (fig. 6A). In the core region, there are around 0.21 Mg2+ ions with direct phosphate interactions per rRNA nucleotide. The ratio falls to nearly zero in the outer regions of the LSU.”

MG2+ are kind of the neutrons of the ribosomal atom. :) Holding things together from the inside.

So I’m guessing that human ribosomal RNA large subunit needs the extra GC as it has more bases and that it can only compensate with MG2+ ions and protein scaffold to a certain degree. Could also explain why it needs its GC rich appendages on the outside. I’m still scratching my head about why fruitfly doesn’t need a higher GC ratio in its large ribosomal subunit.

But I think the general more heavy GC'ing around the multiloops is an additional way for ribosomes to help themselves to stick together.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Rhiju, I was reading one of your papers that is related to ribosomes: RNA regulons in Hox 59 UTRs confer ribosome specificity to gene regulation. (Open access, check here, search for the title and choose paper)

I find this bit of the paper particularly fascinating:

"To date, only a small class of viral IRES elements have been shown to interact with both the large and small ribosome subunits to form a translationally competent 80S ribosome30,31. However, a biotinylated full-length Hoxa9 59 UTR, as well as the minimal IRES element contained within nt 944–1,266, are able to pull down ribosomal proteins from both the large and small subunits, including RPL38 (Fig. 2c, d). The full-length 59 UTR also pulls down both 28S and 18S rRNAs (Fig. 2e), suggesting that the 80S ribosome is able to form on the uncapped Hoxa9 59 UTR."

Translated: Basically they found out that some messengerRNA (mRNA) had two ways to get translated.

Normally mRNA is capped, which is a way for the ribosome to check that it is translating mRNA from the cell and not some opportunistic viral RNA. Viruses have found an alternative way to overcome it. Some viruses have an Ires element which can help pull the two ribosomal sub units together and make the ribosome assemble, without the usual starting machinery that is normally necessary for translating the cells own mRNA’s.

But some mRNA’s also had that special Ires code that viruses used + a cap and despite having the correct cap, they got translated by the viral element when needing to grow body parts like skeleton.

The Ires element reminded me of this sly fellow. An octopus dragging together two coconut shells to hide itself - displaying a surprise use of tools. Here is one octopus that has perfected the art. :)

Now I think I understand why the ribosome needs to be in sub units. Proteins don’t seem to dig doing bigger switches in structure unless there is something like a pH change or a ligand binding which adds energy. It would unpractical and impossible for the cell to change pH, each time a new peptide bond was to be made. :)

Hmm, okay, proteins are mainly surrounding each of the RNA subunits, and the core is RNA. But then again, I don't think that RNA fancies doing that big switches either. So still same result. So what proteins especially, but also RNA can't achieve when alone, they can achieve together. Now it also makes sense that the core is RNA. As it is the better at performing switching when in a much smaller version, compared to protein and since the early world of life is thought to have been an RNA one. 

Earlier I have kind of been thinking about most of the ribosomal RNA as space filling RNA and protein strings attached to keep the whole thing together. I only imagined a few sections of the RNA to actually have a specific function, either by having a shape that allowed space for holding holding of a tRNA codon. However what I am starting to come to a realization off, from the paper Rhiju linked, that the ribosome has been build in layers and each layer tends to add a new function on top of the existing ones.

However I do think that some of the small hairpins on the multiloops in the multiloop highways are really just space fillers and more are there for making sure the multiloops got identity variation by having different stem count and length, so they don’t misfold.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Crystals, x-ray structure and the asymmetric nature of RNA

Now I wonder if the asymmetric nature of RNA is part of the explanation on why RNA is harder to crystallize compared to proteins?

Proteins are far more ordered and compact in their structure. Proteins form beta sheets, where side chains line up with each other in a repeated and regular manner. Even their alpha helices have ordered structures in themselves and even more when more of them line up side by side.

Beta sheetjpg

On a higher level, proteins also often use repeating domains/units. All of this adds up to higher orderliness. Order means denser structure and higher crystallinity, meaning it should be easier to get the structure by X-ray.

RNA rarely have bigger stem regions line up with each other. Though RNA can have coaxial stacking, where two stems line up with each other which is an adder of stability and energy bonus.

Proteins also generally contains far more symmetry compared to RNA - that's except for some very symmetric RNA switches ;). Although there are some symmetry to some higher order RNA structures, but not if one zoom in and look at the details.

RNA seems fractal in an asymmetric way. Needing variations on all levels, in particular the bigger it gets.

I suspect the asymmetric nature of RNA has a good deal to do with why it is harder to obtain x-ray structures from RNA compared to protein.

It all makes sense now. :)

Now I also wonder if the RNA designs with coaxial stacking are easier to crystallize, than similar sized RNA designs without?