Huge RNA

  • 2
  • Article
  • Updated 3 years ago
Huge Natural RNA

Shortly after the Wired article about Eterna, a biochem student/scientist paid our game a visit. I asked him about his background. He said he had taken a biochem class with Harry Noller, who had studied the Ribosome. He then showed me a picture that looked like nothing I knew of from in game. The closing basepairs in the multiloops wasn’t all following the GC-orientation rule, though many were. And they wasn't even all of them GC-pairs. The loops were odd too. Here is the exact picture he showed me.



It made me start thinking about why natural RNA looked so different from ours. Today a thought popped up in my head. What if the Branches lab instead looked like this.



Or this?



I suspect both shapes will not be solvable with as many same turning GC-pairs in the multiloop or following theG-pattern in the end loops, as the winning labs did for the original shape.

Cloud lab 6 - Cross follows both the orientation of GC-pairs pattern in multiloops and the G-pattern almost perfectly.

Example of a design that follow the G-pattern 100%.


Notice the overall tendency of the winners in this lab to do the same.


But in the branches there are exceptions, both to multiloop pattern and G-pattern.


Outliers marked with stars.


If you look at the branches lab, it has repeat structures. 3 multiloops of almost similar size. Two twin branches. Whereas in the cross lab, there is only one multiloop. It kinds of makes sense there is more variation and straying from the main pattern, than in the simpler Cloud lab 6 - Cross.

What if the really big RNA’s looks like they do, as a means to avoid too much repetitious pattern? What if the numbers of element repeats matter? The bigger the structure, the more similar elements there will be. More multiloops and more stems of same size. How to avoid those from mispairing?

I think what I’m saying is that even when a certain pattern is superior (the most prevalent in winning designs) for an element in a small puzzle and will overall work well when used, that if something is repeated enough times, it can cause misfolds. As several Eterna players before me have noticed, design crave variations.

I can see that if the element is repeated many times in a small design, the elements are starting to stray from the usually most stable pattern. I see that 1 nt loops are fond of having double C’s at bottom of the 1nt bulge, but if there are many 1 nt bulges in a puzzle, more varied solving pattern for the bulge is often favored. So it seems that sometimes using a less stable pattern will then be superior to having the overall best pattern repeated too many times. The superior pattern can rule, but only until too much pattern repetition becomes a RNA structure killer.

While I was working on this post, Janelle threw a line in the chat. I think she was actually talking about lab details as she was giving a fine lecture on lab technicalities. But I think it sums up my conclusion very fine.

The smaller the RNA, the better the prediction

Let me change it to:

The smaller the RNA design, the easier rule prediction.

I think the small lab puzzle sizes we are working with will help us find strong overall patterns. If this is correct, I think challenge for us will be on when and how to best stray from the underlying rules, when the size of bigger lab puzzle later will forces us to. I think we will have to find a new layer of rules for the big labs.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Posted 6 years ago

  • 2
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Mat and I discussed double same turning GC-pairs. Neither of us were too happy about them in Classic Eterna labs. Mat had them as last option in his lab design strategy (non preferred sequencing, page 3), whereas I made strategies banning them. Now we are both using them in labs.

I think the reason why more same turning basepairs are tolerated now, comes down to the simple fact that we have longer sequences now. They are simply more needed. As having all basepairs twisted, would also create unwanted complementarity. I actually think I understand why huge RNA wants so many same turning pairs of GC’s. It is simply to help avoid complementarity. And why the bots did bad in lab was, was because they had way too many double or longer same turning pairs, in rather short RNA sequences, this way creating tons of ways for sections in the RNA to be complementary to themselves. They simply had too many options to pair up otherwise with themselves. If any kind of pattern that is complementary to another pattern, and both is repeated with too high frequency and they have a rather strong pull. I mean GC-base pairing or watson crick base pairing over non complementary binding, then the likelihood a mispairing will occur, is simply bigger. In particular if that pattern is not locked safely away inside longer stems in a structure that is made stable at closing base pairs spot and next of pair.

I considered double same turning GC-pairs particular risky at closing base pair spot, multiloops in particular. But now they pass if not used to heavily. Double AU base pairs stills isn’t welcome at that spot, but I know they are used too in real big natural RNA too, that are much longer than our sequences. However also double same turning AU’s have shown to be very useful in particular in middle of longer stems. Even double GU’s are allowed that way sometimes.

So now double same turning quads have shown themselves to be more legal and even called for. In particular in designs with longer stems, where they also went fine in many Classic Eterna labs. Same turning GC-pairs were also called for back in classic lab in pressured designs with many short stems. Something I was wondering about. I think it comes down to something as simple as variation helps break repetition. Since the stems were short, all of them could not use the more optimal patterns for mid length stems, so the consequence was double same turning pairs for more of these short stems.

So I think what I stated in My Strategy Guide for Lab, turns out to be even truer than what I expected:

“Bad pattern depends partly on location. What is bad pattern style is not necessarily bad pattern in a longer and more tolerant stems.

Every pattern is a bad pattern if repeated enough times in the same design. Every pattern is bad if put in the wrong place.”

It also now make more sense to me that the bigger multiloops in this monster huge RNA does not follow orientation as I expected from the more middle sized multiloop in our much smaller RNA designs. The tendency for the bigger multiloops was for them to care much less about what to us was the more usual orientation from what we know and rather go mixed or reverse orientation. I think there are more rules for discover for these big multiloops.

I don’t know, but I have an idea that the scientist bots take their patterns not just from energy calculations, but also from natural occurring RNA. Could it be that the bots have taken these patterns from RNA’s much different in size than those we are handling or a mixture of size? I think it matters real much for reuse of patterns, that one take them from a similar sized design, similar length of stem and positioning of that stem. I think an stem, despite being same length, takes a different solving, depending on if it is adjacent, placed on a more relaxed multiloop or tuck onto an internal loop. I think it matters what is between at both ends.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
This is why I always go back and look at very large natural RNA.

Also, bond stability is not the same a functional utility to the organism. So sometimes weaker bonds (lower stability) are better for the intended function, IMO.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Hi Chris!

I'm happy to hear that you too like to look at very large natural RNA. For me it was a revelation - though at first a very confusing one, because I only knew our smaller and synthetic RNA.

The lower stability bonds and weaker areas in natural RNA, is something that would result in less blue and more unstable SHAPE data for our RNA. If all the RNA is intended to do is to hold the structure, then deep dark blue SHAPE data is what to go for with stems. I think like you that there are spots and areas that will particular call for a less strong bind. Particularly when RNA has enzyme or switch like function.
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 122 Reply Likes
OK — thanks for the discussion. If you had to pick a long shape of 250 nts (and we could only make 8-16 designs), what would it be? This one?



Or are there player puzzles you’re more interested in? For example, there may be some that were unsolvable by bots even 'in silico’. What if players could not only solve them in silico but also in vitro?

Cheers,
Rhiju
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Hi Rhiju!

Np. I think this one should do. It has different length stem giving us a chance to actually solve, still it has a lot more multiloops than usual, so I think that orientation of closing GC base pairs that that usually is helpful for solving multilooops will start break down.

Let me come back to you on other interesting in silico puzzles.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Fractal Folding Pattern of RNA

Rhiju, I promised to get back to you in case of more input on other interesting in silico puzzles for this test. The story sort of grew exponentially. :)



Nearest Neighbor Strand


In the case of the glycine riboswitch tandem aptamer, there are two almost identical domains of the same glycine binding aptamer. Just with slightly varied structure and sequence.


A pure copy of a whole domain means a double set of many identical stems that then all have a potential extra partner. The more distant strand copies are less likely to be a problem than the closer by complementary strand copies. As a RNA strand in general most of the time will by far prefer to pair up with a nearest neighbor strand in sequence. But still any close by copy strands in space or sequence, will make the risk of misfold far bigger.


The reason for why I think closeness between two identical domains matters, is because of the way I think RNA fold.


The basic folding patterns of RNA is to form stems, hairpin loops and multiloops. With some few other variations as bulges and internal loops.


I think there is some general pattern to how RNA likes to fold up. There seems to be a pattern to what kind of fold is more likely to happen.


Let's say that the first 3-5 times in a RNA sequence is folding up, the yet single strand will generally prefer to pair with its nearest neighbor strand in sequence. (provided that there are some few bases left for making a hairpin loop between the strands)


A bit rarer but still regularly happening, the strand will instead continue the route of making a prolonged stem, just with a pause in form of an internal loop or a bulge and then pair with the nearest neighbor strand in space.


Most rare (like 1 in 3-8 of the stem folds) will the strand cross fully over in space and sequence and form a neck. (closing stem of a multiloop structure).


Or as the noble Ig noble scientists found when investigating standing and laying habits of cows. The longer time a cow has been lying down, the more likely it is that it will soon raise up. :)


Put in RNA folding terms: the more stems that have formed of neighboring strands, the more likely it is that a multiloop (long distance sequence crossover) soon will happen.


The amount of stems per multiloop needs to be varied - even on the same level in the hierarchy.




So when 3 hairpin stems have formed close by each other, then the statistical likelihood of the next folding not be a hairpin stem, but the neck in a multiloop will raise for each later stem added. A “knot” formed in the sequence of distant sequences.

Rarer yet, a super knot will be formed knotting bigger domains together, in one big giant domain, attached to yet another multiloop or in a necklace of gap bases.



Ribosomal RNA

When the RNA is really big, like in ribosomal RNA, several extra hierarchical layers are added.


There is also the added twist that for each added multiloop, the next multiloop try avoid being the same size and contain the same amount of stems and stem lengths as the former multiloop if there is any.


Typically a knot would be happening - making of a multiloop - after like 3-8 stems have formed. On a level beyond that this multiloop branch can be stuck on a larger multiloop or hook bases.


This pattern is fractal like, but an asymmetric one. Like a tree where not all of the branches are equally long and some are missing. And the nodes are not starting the same places. An asymmetric fractal tree.


The numbers of stems per multiloop needs to be varied. Stem lengths in the multiloops needs to be varied. The distance between the stems in the multiloop rings needs to be varied. The sequence in similar length stems need to be varied.


I’m a huge fan of the Noller lab’s colored ribosome images. I added a bit of extra color :). Highlighting the pattern of multiloops "stacked" on multiloops with stems between them.




Notice that none of the highest hierarchical level (red) multiloops have identical structure. The multiloops vary in size, they have different stem lengths attached between them, also they don’t have the same amount of stems. Same is true for the second level (yellow) multiloops. Same is true for 3 (green) and 4 level (light blue). Although there is a slight but not total trend of the multiloop size going down in size the further out it get. (Purple dots = start level (top). Can be a hook with gap bases like here or a multiloop.


It seems more important that the first two hierarchical levels of multiloops vary in size across their level than for the multiloops at the deeper. I guess this is because the rest of the multiloops are already isolated deep into the big domain. The most important bit to keep independent are the first stretches attached to the big hook rings with gap bases in between. If the domains stays separated on that level, then a good part of the job is already done. The further out in the hierarchy, the smaller the multiloops tends to become.


I have simplified the fractal image and only shown the multiloop connections to the hook bases. Notice that none of the branches are the same, except two short ones. Also notice that they are spread out with an empty branch between - one not containing a multiloop.


Multiloop dotsjpg



With hierarchical levels drawn up (Halloween spiderweb ribosome :) )

Multiloop dots2jpg


What if it can be predicted where these high hierarchy knots need to be? What if there is something that characterizes them?


What I have learned from our smaller puzzles, is that stems have more likelihood of turning up somewhere than elsewhere. The neck has a habit of happening with first and last bit of the RNA sequence and are far less likely being placed somewhere in the middle of the sequence. (In the small labs below 80 bases) So some strand pairings are more likely than others. Similar I think there is a statistical likelihood of the strong knots turning up at in some regions of sequence over others. I think these knows will turn up at a particular periodicity - though with a bit of asymmetry added. And where, will depend on length of sequence.


Also I think there might be something else that characterizes a knotting in space. I think there might be some pattern with sealing off knot areas with a strong GC region - something I call strong fork - to let the rest of the domain gets its individual folding space unaffected by the other regions.


I took a look at the image of a ribosomal 16 s sub unit, that I was circling up these enforced GC sections on earlier and am thinking that they seems to take the route through the longest straight stretches containing multiloops.




I thought that this was real interesting and decided to give the large sub unit of the ribosome the same treatment. There are also GC heavy regions in stems not near any multiloops, but I find it interesting that they turn up with such a regularity just around one or and often two of the stems of a multiloop that is branching off into deeper level structure.





Fractals and RNA

A long time back I started to wondered if there were fractal patterns in RNA. I googled and found an older article saying that there was. Actually Paramodic noticed that some of the bots failed on fractal puzzles. I couldn’t find much on the topic RNA and fractals though. I mostly forgot about it again till I saw Codygeary’s fractal puzzles series. Puzzles he had created with a script. (RNA Fractal Maker) They reminded me of a question Rhiju had asked me. About what puzzles I would pick out for a longer RNA design that I think will not be solvable in lab, despite being solvable by players and by robots.



I thought these are exactly it, if I should pick extra. They are beautiful, but they are way too symmetric. So they are perfect candidates for lab disasters. :) They are solvable by players while they do have a rather low solve count. I’m not sure the robots can solve these, but I know that Cody’s fractal script usually spits out both structure and but also a solve too if the script can solve the puzzle. Fractal Leaf 1 is solvable with the script.


These puzzles holds a ton of same length stem, symmetry and identical sections. The latter it is fractal nature to make. The rest can be set by the script parameters. One can adjust the length of the stems on the different levels.


I’m reminded of Dimension9’s lesson to me when he did a mod of one of my first lab designs, saying that he liked my design but the sequence was almost too symmetrical. (The structure were already in itself symmetrical, which I now know makes it even worse.) A friendly advice is not forgotten.


Later Rhiju and co have also been mentioning the fractal nature of RNA. I have been thinking about fractals since Cody’s puzzles and script. But RNA is not fractal like his puzzles. RNA is fractal in a whole different way.


From what I understood about fractals, they have never ending self similarity, so that the smaller regions will be perfect replication of the bigger ones. This is not the case for RNA.


Except it is well possible for small RNA’s, with just very few hierarchical levels. In these we can get away with playing repeat structure and symmetry also in lab. Several of our first labs are proof of that. We got winners. :)


First Eterna labs




Huge RNA and domain variation

What I have learned through my time with Eterna is that natural RNA looks a whole lot different from our puzzles.


Puzzles based on natural RNA


There have always been a distinct feel to the structures in the natural RNA’s - many of our early challenge puzzles were built over natural RNA - compared to our much more symmetric and ordered puzzles. But I didn’t knew what it was. Now I think it is the fractal nature.


Player puzzles with symmetry and structure repeats


Additional note: Ok, now I do wonder what went on with the natural Saccharomyces Cerevisiae’s transfer RNA, that thing looks way too symmetric and pretty. :)



I suspect that this is a case with multiple transfer RNA’s on the same sequence string and that they needs to be chopped up into individual transfer RNA’s and as such it doesn’t matter much if it misfolds. (Yup, it really looks like it. This page has a structure similar to this beast and it says there are 5 numbers of tRNA molecules. Also I like the fact that the last two tRNA’s in the eterna puzzle starts to exhibit different structure on loop and stem position from the first 3. RNA needs variation).


Ok, back to what fractals are normally associated with. Quote on fractals from WIKI:


“A fractal is a natural phenomenon or a mathematical set that exhibits a repeating pattern that displays at every scale. If the replication is exactly the same at every scale, it is called a self-similar pattern.”


Self similarity in structure, elements and size of elements used, is not what I see happen in any bigger of the RNA.


The fractal nature of RNA is very different from the usual symmetric fractals - with identical repeating units - perfect repeats. I think instead RNA needs asymmetry introduced at all hierarchical levels.


I found an asymmetrical fractal build in lego. I in particular like this one as it creates a globular structure. Which reminds me about RNA and proteins, when thinking of them in 3D.


 See full-sized image


I found the image on this fine blog post on building fractals with lego. :)


So I love how the WIKI continues:

“Fractals can also be nearly the same at different levels.”


The bigger the RNA gets, the more difference I think there needs to be on all levels. Domains should vary in size. They should contain different element combos, the size of multiloops should vary, length of stems should vary and size of bulge, loop and multiloop ring should be a variable too. The multiloops can not have the same size, and if there are any multiloops of similar size, they can not have similar amount or length of stems. Variations needs to be introduced on all levels, if the different parts are to work together well.


In RNA the fractal self similarity is, that the repeated units can not be self similar. :)



The asymmetry of fractals

I was searching youtube to see if I could find anything on asymmetrical fractals. The first thing that popped up during my search was this piece of music and since I was already in curious mode I decided to check it out, because it is in my favorite genre. YouTube, you know me too well. :)


Asymmetric Fractals - Evolution (Progressive Metal)


When I heard the music, it hit me. Asymmetric fractals are already long built into music, and I bet these guys knew. Especially since they called the song Evolution and Asymmetric Fractals is their band name. :)


Being fractal and partly asymmetric is really what all good progressive metal does (along with jazz, blues and other music of non western origin), to a far bigger degree than pop and rock although the latter are also fractal. Just with more symmetry and more repeats.



Structure of music

A song typically consists of a verse and then a chorus. The chorus gets its beauty from the contrast to the more plain verse. Call it bread and butter. It works. Then these are repeated and sometimes intersections are also added. But generally there is very little variation of the main recipe, which can make it easier to get tired of the song more quickly. The best pop and rock songs avoids this by having a solid melody, varying the music slightly (background) and by having the singer varying voice intensity and intonation (main melody), while singing the same chorus again multiple times. (repeat)


Jazz, progressive metal and classical music tends to add far more variation both to singing and music, while still keeping recognizable returning points.


A song consists of bigger sections decreasing into yet smaller sections. Exactly like fractals. Verse and chorus, sections in those and again smaller sections in these and then tones.


Machinelves reminded me that we have once been talking about making music literally from RNA sequences. :) And nature have already literally put a rhythm in RNA.


I dug up song (Jazz) that I love, which also has this asymmetric fractal pattern that I’m talking about, standing out very clear. This may help those of you who are not used to decipher heavy metal.

Artist: Trombone Shorty, Album: For true, song: Big 12

Imagine it like this: There are chorus, verse and some intersections - these top hierarchy sections (Multiloop or gap base) each consists of 1 or more bigger sections (multiloops) - which each consists of smaller wholes, (Stems) which consists of tones. (bases).


Each of these smaller whole - the stem equivalent - are varied, none are the same - not even inside the individual big section - despite many of them carrying similarities. The similarities also happens across the big sections. The Jazz song has all its “stems” sections varied where the metal song vary less in some of the guitar riff sections but get the variation by the contrast in changes that the other instruments makes.


The good patterns in RNA are like guitar riffs. There are some that works really well, but they get boring (misfolding) if played over and over, but works well together, when varied and mixed.


Each RNA domain needs its own unique structure and sequence, to be able to contain its personality and not mix with another. Similar a good song keeps different sections, but give each a unique twist, while still following scales like reusing the same elements. Like multiloops, internal loops etc.


Musicians are good tinkerers - just like nature.



How RNA differs from music, but is also the same

RNA seems to be even more asymmetrical fractal in its nature, compared to these two songs - on all of its hierarchical levels. Especially the larger the design gets. In music, each section inside a chorus will usually have the same amount of timing per subsection. (comparable to if each multiloop in RNA had exactly the same amount of stems each time). That amount of similarities between different multiloops is not allowed.


Each subsection in music will also vary its tones and also typically have the same amount of time, but can carry bigger variations in amount of tones played in that piece of time - something which is less frequent rarer in pop, but more typical in progressive metal and classical music. Similar a stem should carry both base length difference and sequence variations.


In a section in a chorus you can get away with repeating the same piece of melody over again for a while. Similar in a small RNA design, you can get away with multiple stems of the same length on a multiloop. But as soon as the RNA gets bigger and its repeats grow bigger, then trouble comes.


Variations are needed on every level, be it base sequence in an element, size of stems and other elements, size of multiloops, multiloop on a multiloop and multiloop on gap bases. Even if there are repeats of any of these - even across the different hierarchical levels. Notice how every section of the branches of the ribosomes is unique with no real structural repeats. There are different amount of multiloops on multiloops, different sizes of multiloops across same hierarchical level.


I like pop and rock, but I love jazz and progressive metal exactly because these genres are are playing around and tweaking the rules along the way.


I in particular love how progressive metal as a ground rule breaks some of the usual music rules. Keeping a more unpredictable pattern than most other genres - which still having its own rules for the pattern breaking.

Ways to break pattern is changing pace of rhythm, breaking rhythm, plus having a general high mutation rate of the core musical theme, while still repeating it. I love the genre for its musical equilibrism, for experimenting, using more skewed notes and regularly daring to put in unusual instruments together, even to an extent of doing genre mixing. I stumbled on a song with a mixture of heavy metal and flamenco music and if someone had said they would mix those two genres, I would have been rather skeptical, but it was simply meant to be. :)  (Example: Band: Flametal, Album: The elder, Song: Bruja tortura)


Also I never forget the music video I saw where the electrical guitar got played with a violin bow. Poor bow. :)


There is always an element of surprize.



Full moving switches, partial moving switches and huge RNA

I’m still being a bit angry with the full moving RNA switches. :) However no more than I did make a strategy for them. They just need their own algorithm to get happy.


Really I think they are part of the progressive metal of the RNA world. Complex and far harder to play/solve. But why go complex when simple will get the job done? For small RNA I’m just pointing to the faster road to unraveling things by making partial moving switches as they work well in an easy manner. For huge RNA we will need to go extremely progressive. :)


It turns out I love RNA for exactly the same reasons I love music. Fractals... :)


Life seems to be an eternal dance between symmetry and asymmetry.



Perspective

Now I decided to check out if others were seeing fractals in ribosomal RNA too. And there were. These guys have been looking at the smaller ribosomal 16 S sub unit. They are talking about fractional dimension.


I’m not sure I fully understand that bit yet. From what I get, I think it is about packaging of the fractal in space. When thinking about it it kind of actually makes sense. If RNA was perfectly symmetrical fractals as pure fractals, I think at least some fractal types may be crappy for packing itself up in a somewhat round globe as many RNA and proteins are.


Now I recall the 3D fractals that Machinelves sent me. They were utterly beautiful and it is now rather funny what I said:


“Best drag my head out from there again. I have RNA adventures to make and more finds to share.”


I had not thought these two worlds to be so interrelated, or I might have gotten lost in fractal dimension. :)


One of the 3D fractals was structured like Swiss cheese, but just with triangular holes. :) Perfect symmetric fractals of the kind that leaves unpacked space, would give too many holes with empty space not used up and costing energy (at least would if in some proteins) and too many crammed up regions.


Proteins seems to be fractal too, but I think not in the same asymmetrical way as huge RNA.


Tissue formation follows fractal pattern too and it too fills a space in 3D. I think there is a part of the fractal pattern that sees to that things ends up fitting together well in 3D space.


I wonder if it one day in the future, instead of putting together stem motifs for a RNA design, would be possible building RNA with asymmetrical fractal patterns.
(Edited)
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 122 Reply Likes
By coincidence (or not), I've been immersed in the ribosome structure. One good read is a recent paper on how the structure might have evolved: http://www.pnas.org/content/111/28/10251 

Sounds like its time for Eterna to get into 3D fractal design...
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Sweet! :)

Thx for the paper. Will check it out.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

I read the paper and this is amazing!


“Here we show that rRNA growth occurs by a limited number of processes that include inserting a branch helix onto a preexisting trunk helix and elongation of a helix.”


Trunk helix seems to be slang for some kind of ear piercing. So really trunk helix means pierced stem, when it comes to RNA context. :)

Ribosomal growth at the tips of the ribosomal rna. So not only do ribosomal RNA display fractal nature. It even grows in a fractal manner. :)

Keeping the core the same but adding at the tips. (Just like with a fractal where the core stays the same, but the smaller parts at the tips grows visible when zooming in.)


It also looks like the additions generally happens somewhere in the middle of stem sections. (Figure 5). I can see the joke in that. Yeast ribosomal growth - budding off like yeast. ;)


I also love that evolution actually leaves fingerprints. :)


“rRNA expansions can leave distinctive atomic resolution fingerprints, which we call “insertion fingerprints.”


I couldn’t help myself but draw on one of the images. Figure 6B. It is extremely beautiful.


In the first one I added lines for the different phases. I started in the middle of the first phase and tried put it so it went through the middle distribution of that phase. I found it interesting that there was a fan pattern. With a even or growing? angle. It's like there was an angle shift for growing direction for each phase build on. The first two phases switch opposite to the others.







Here I tried with no starting center but just trying to put the line somewhere in the middle of the color distribution of each phase.


(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Extra fun fractal "fact" today.

Saw this image and I'm positive that squid tentacles are fractal. :)

While searching for more on fractal squids, I found this page:

http://io9.com/incredible-photographs-of-fractals-found-in-the-natural-480626285

Turns out bacteria can make fractals too. :)

While several of these fractals are far more regular and pure fractals, some of these fractals are also shifted in angle when making new sections, kind of like what I see on the phase colored image from the article Rhiju shared. I'm aware that 2D isn't the same as 3D, so things may look different in space with the angling. Still I find it interesting.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Machinelves sent me an awesome letter on how to make music of an RNA sequence. I enjoyed it so much, that I decided to share the fun. Here it is and with the same title as she gave it:


Musical RNA!


omg!


go to this website

http://www.typatone.com/


click on the clipboard to paste in a sequence

http://prntscr.com/8s5stu


then paste in the sequence, like this one from your latest lab design

GUCACAGGUACAAUAGUACCAGUGACUAUUUACAUGAGGAUCACCCAUGUCUUUAUACUUAUUCUUUAUUGUGAGUUGGGAUCUA


http://www.eternagame.org/game/browse/6204761/


then click on the paper airplane to get a URL of the song


http://typatone.com/m/wLMyagFpm1


you can also change the kind of tones by clicking on the music note


:D :D :D



Type a tone - in use on other types of RNA

Of cause I had to test this out on a part of a ribosome. :)


Here is the result:

http://typatone.com/m/WXMtmIz0lj


The G’s are a very characteristic part of the melody and rhythm. Being there more frequent than the other letters, being close and mostly single + double. And having the highest pitch.

Here is how I did it. I searched the Protein Data Bank, I set polymer to RNA, searched for ribosomal, I choose e.coli and found a 23S ribosomal RNA. I checked under the sequence menu and opened fasta to download the sequence.

http://www.rcsb.org/pdb/explore/remediatedSequence.do?structureId=3DG0


I later listened to long noncoding RNA too. It resembles a little, but holds less of an order. For the fun of it I also tried out messenger RNA. I have earlier noted that there were many repeat bases and less mixing of bases in messenger RNA and as I suspected it wasn’t as musical as I would have liked.


MessengerRNA:
http://typatone.com/m/Z9rMrUuCHe

Origin:
http://www.ncbi.nlm.nih.gov/nuccore/10190669?report=fasta


The G rhythm pattern wasn’t as clear in the non coding RNA and it disappeared in the messenger RNA. The ribosomal RNA even had some kind of rhythm to the C’s as well - probably not strange since many of them bind up with G’s in stem. It had an overall musical pattern to it, although it was far less ordered both in tone and rhythm patterns than usual music.

MessengerRNA’s sounds more repetitive. I could get used to Ribosomal melodies - they still have some kind of a pattern to them. :)


Proteins sounds cute. This one starts out beautiful:

http://typatone.com/m/yJHiyadvvY


Origin: http://www.rcsb.org/pdb/explore/remediatedSequence.do?structureId=2N1F

 

Silly suggestion for lab functions - get an option to listen to the lab designs. Joke aside. ;)


Hint: the bad ones will sound way too repetitive. :)


Please do tell if you find other fun types of RNA, DNA or protein music. Even if they are ugly. :)

(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Do RNA has domains like proteins do?


I had read about proteins having domains and this was what lead me to wonder. So I asked Rhiju, if RNA has domains, like proteins do.


Rhiju: Hmm interesting. I think it’s an open question how to prevent different segments of an RNA chain from base pairing to each other, although soon we’ll want to have rules for this separability when we try to compose 3D structures from designed domains.


In proteins, domains are sections that can have a function on their own. Its kind of like a autonomous working unit.


From what I understand in proteins, domains can have identical structures with identical sequence and multiple of them and they are working fine together and no misfolding. So I think protein domains are different to RNA domains in a fundamental way.



What is a domain in RNA


When it comes to RNA and domains, I think of domains kind of like a section.


Elements are not a section, a section is made of elements. I think of a section typically as separated by gap bases or it could be a branch coming off from a multiloop.


For static designs I think of a structure like a multiloop and its attached stems could be a section.


However I think for switches a RNA domain can be as small as two stems and an aptamer.


Eg in one of the Top notch designs, the switching seemed to occur mainly in the small stem after the aptamer. However I can’t know for sure how much of the structure is unnecessary for allowing the switch in the aptamer, but I imagine if just the static end of the aptamer is long and stable enough, perhaps a bit longer than in this actual design, then it should be possible to get this mini section to switch on its own. So that could be an example on an domain in RNA.


http://eterna.cmu.edu/game/browse/2426190/?filter1_arg1=2502237&filter1_arg2=2502237&filter1...


I have earlier been kind of angry on designs with adjacent stems in multiloops. Because in the beginning I saw them turn up mainly in designs that didn’t have many winners. I originally thought it necessary to have sort a base or more between the different elements as to keep them apart from each other. But I found that it were mainly designs with adjacent multiloops and also very short stems that caused the biggest problems.


I think length of stem and varying stem length, is one of the keys to keep domains and sections in a RNA design, separate from each other. As is varying of loop bases and gap distance. (unpaired bases).


However I still think that having a bit of single bases between different elements can be a help for stems not too easily start interfering with each other. While I’m also aware that adjacent stems via coaxial stacking can actually help stabilize each other. Which leads me to wonder: Do coaxial stacking only happen in longer stems?



Repeat structure - and their effect on the RNA fold


I think RNA does not have domains in the same way as proteins, with both identical sequence and structure. We have had labs with repeat structures. The branches lab was one of them.


While when there were a solution for a section which was absolutely strongest and best, it wouldn’t stay strongest and best if repeated in multiple identical sections. 

What I have learned over time, is that one can’t keep repeating the best sequence, neither on element basis, or section basis, to solve identical elements or sections. If you do that you won’t get the same structure. I recall how Dimension9 kindly pointed out in one of my first lab designs, that my design was almost too symmetrical. That the stems were too similar.


This varying of sequence in identical length stems is something Mat has been practicing from start and has pointed out in his lab designing strategy. If there were more of the same element, you need to vary your solve and pick next best option for solving the structure too, when there are more than one of the same element. This is why he has a whole list of next element to pick when the best option has already been used.

Mat’s Lab Design Strategy


Repeat sequences - and their effect on the RNA fold


So imagine this. You have an unfilled RNA string of A’s. Then you fill in two identical letter sequences in that would fold into the same secondary structure if they were alone. Now will they fold into the identical structures their sequence carried in them? Ok, you can often get away with two, without misfold and with identical and intended secondary structure forming.


It also depend on the repeat sequences length, their intended structure and the number number of repeats. If they are fairly short like 4-5 bases and there are 2-5 identical of them, and they are close in space, you can abuse their similarity to get a switch happen. :)


Repeats and switches


Preferably I want these short 4-5 base repeat switch sequences backed up in a tiny corner of the design (in the obligate switching area or the unpaired bases around it) - having the main part of the design static and non moving - so they have little other choice than to jump between their intended target/s. So I think playing a game of uncomplimentary in the static part of the design, to the switch repeat sequences, is a successful way of avoiding the switching part of the design interfering with the part of the design that is meant to be static.


Its mainly depending on the repeats size and stem lengths and a bit on their position in relation to each other. If they are close in sequence misfolds happens easier, similar if they are close in 3D space misfolds happens easier too.


Repeats and stem length


I bet if stem lengths are generally short, repeats will start interfering much faster - even with just two identical sequences. But if stems are long enough or gaps between them are long enough, you may still have them folding into their intended identical structures.



Structure repeats


The branches lab has two identical sub branches. A strange pattern hit through the whole lab, for the second of the branches. Something which were causing me quite a bit of wondering. I did not understand why the second branch would need a pattern I would normally think of as inferior.


This design has repeat structure.

https://getsatisfaction.com/eternagame/topics/slightly_skewed_energy_tendency_for_the_whole_lab


Both these designs and lab, had many of the winner design use this normally less secure pattern - as a way to create variance in the sequence and avoid misfolds.


So the need to vary sequence of identical structures to avoid the sequences pair up, is what I  now think is the source of the double AU’s in the second branch. A pattern that is allowed in stems regularly, but is more risky playing next to a closing base pair, as it is sometimes breaking open. I counted the solve for the second branch weaker than that of the first branch.


None of the designs in the Branches lab designs have identical sequence in the two structural identical branch sections. However I bet if the experiment were run again with the winning designs and just one of the branches were mirrored onto the other (and vice versa), then the main part of the designs would not do nearly as well as their origin.


So if the sequence is in the structure, it depends a lot on what that sequence is also in sequence with.


Elements repeats


Here is an example on repeat elements. Two 4 base pair stems with identical closings, but with different middle, and most of the winners used a pattern, just like the branches lab, at one spot which I considered less good. Notice that also the two other small 3 base pair stems, which also are repeat elements are solved differently.

https://getsatisfaction.com/eternagame/topics/even_energy_distribution_continued



Sum up on protein versus RNA domain


  • Protein domains - Both structure and sequence are identical for multiple identical domains

  • RNA domains - Sequence starts differ for multiple identical structures. In RNA two identical target structures generally means two different sequences. Two identical sequences for structures, may also not yield two identical structures.


Two longer repeat sequences, normally will not mean two identical structures. Three repeat sequences and you won’t recognize the original structures you thought were going to form.


Two repeat structures means two identical structures with different sequences (if the sequences do not differ, you are in risk of misfolds.) Three repeat structures and the sequences definitely have to differ some.


If you use many more repeat structures, you simply don’t have enough strong solutions to solve the individual sections or elements - plus you gather repeat in sequence, due to similarity in sequence between the optimal solves for an element or section.


For RNA the structure is not only the sequence as it is for proteins. The structure is the sum of sequence + how big its and its surrounding sequences potential for fitting better elsewhere is.


That factor for misfolding is a thing that to some extent can be controlled at least in smaller lab designs, by different things like, not making stem too short, vary stem length and not keep too much repeat structure. So simply by raising length of stems and keeping stem lengths and gaps length varied, you also build in some security against misfolding. Controlling ratio of GC, AU and GU’s are a way of doing the same.


If you gather many repeat structures - you in practice enforce close to identical repeat sequences and result is misfolds. Because due to similarity in sequences, they are also compatible with each other. If you have two repeat structures and you space them well enough and the stems are not short, you may get away with it. But adding in number, you are asking for trouble. And this is why the bots regularly comes short on our puzzles, as our puzzles often harbor repeat structures and same length of stems to a degree which I haven’t seen in any natural RNA:


When repeat structure and repeat sequences are in plentiful - you will get the misfold from hell. :)


If you use repeat sequence, you don’t get repeat structures

If you use repeat structures, you don’t get repeat sequence.


Which leads to: If you both use repeat structures and repeat sequence, you are doomed. :)


At least if you are not aiming for a ginormous beautiful misfold. :)


Advice on making RNA domains


To make a RNA domain and keep it intact - it will help you to ensure that it is not identical to anything else in the RNA design. Make it different in sequence and structure. If you want two identical domains - make sure both their sequence and structure differ slightly. Same goes for elements - at least if they are close.


Why vary both structure and sequence for similar domains?

One may get away with identical structures, if one vary the sequence a bit. One may get away with more sequence similarity, if one varies the structures a bit. (Provided the two domains are not large and pressured).

However tilting both factors just a notch and it gives a much stronger hand against misfolding. Structurally identical enough to perform same function, but different enough in sequence to not misfold. You stack the bases in your favor and raise the chances of getting a good fold. :)


RNA love one chicken foot, it like two chicken feet, but 3 chicken feet is a monster.

(Edited)
Photo of Meechl

Meechl

  • 81 Posts
  • 27 Reply Likes
Hey Eli, this comment reminded me of a paper I ran into where they describe a technique that can potentially be used to identify domains in large RNA:
http://www.ncbi.nlm.nih.gov/pubmed/23927838
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Hi Meechl!

That paper sounds super interesting. I will read it when I get near somewhere that has access.

Thx for the tip. :)
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Thx for the paper! As suspected from the abstract, it was a really interesting read.


I find it interesting that Long noncoding RNA’s (lnc) are involved in epigenetics and so many other things.


The paper got me thinking about a few things. It said:


“Determination of the secondary structure of lncRNAs is a hard problem: the number of possible secondary structure solutions grows exponentially with the transcript length.”


“A rough estimate for a sequence of length 76 with 21 base pairs (e.g., tRNAPHE) yields 1028 possible secondary structures, in comparison to 10385 for a sequence of length 1000 with 200 base pairs (e.g., a relatively small lncRNA such as the steroid receptor RNA activator). A lncRNA of length 2.2 kB with 500 base pairs has approximately 10888 possible secondary structures.”


This got me wondering if something like even energy distribution could get put in use for this problem. I know structure is the unknown with regards to lnc’s. But in our static RNA labs, where we had a structure given - there is a much more limited amount of solves that will actually fly, just by discriminating by even energy distribution.


Only rare do stems solve with a bigger or lower than average energy distributions compared to the rest of the design. Whereas the energy models, especially Vienna are far more tolerant towards designs that would not fly in lab. Nature (at least the lab kind) is far more conservative in what it will allow.


I’m aware lncRNA’s likely have a much different folding pattern than our much smaller RNA, just as they have to ribosomal and mRNA. But if there were some kind of energy distribution limit - maybe it would have a different average - but if it could be used to discriminate better against what structures are possible, based on the structures that are already known, it could be interesting.


The Thermus Thermophilus ribosomal RNA’s that I have been talking about earlier in this post, actually do have an uneven energy distribution. Which is rather interesting. The main lines in it holding the road of multiloop do have a higher GC pair distribution in those sections. (Orange) So huge structures with need for stability allows for bigger variation in higher energy distribution in certain sections?


https//lh5googleusercontentcom/l-9dB-6AW3Ob_q1s2fIfPmEannW74q2xxgaBCChwyj1sC8IpyiqszI9SArPGZz2MsHh2rjgxhbbhBgvIhTJ0k3XHOH1JWZDdxg-KM0D0aUf4AjInuIUNBJNc7XawK8vVb_oiLls



I wonder if this has anything to do with this specific ribosomal RNA being from a extremophile that can take extreme temperatures? What is the normal pattern for Ribosomes? I wonder if they have these strong forks too in the multiloop branching area? I found a paper with an image of a yeast ribosome (S.Cerevisiae) and that didn’t show the same strong strengthening around the multiloops. And now I recall seeing other ribosomal RNA's that doesn't carry the strengthening pattern either. Note to self, organism and its environment matters a lot. :)

Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Structural observations for lncRNA’s


I googled for images on long noncoding RNA.


There is also a distinct feel to these. I’m aware that I have only seen a minority of the around 15000 that the paper mention is estimated to exist for humans and those I have seen I'm not sure are human. So this is only what I see for a small glimpse of the whole.


They seems like they have a middle starting point that they branch off from. They don’t branch off wildly - but typically like 2,3, 4 big branches. There is even a symmetric feel to them - on one or two axis. :)


2440 base pairs

http://lncipedia.org/db/transcript/lnc-C14orf101-5:1


8708 base pairs

http://www.lncipedia.org/db/transcript/lnc-SCYL1-1:1


Something mirror like, but still a bit asymmetric and bent. It's like there is a close to even distribution of branching off although the actual stems and domains themselves vary, position too.


I think they are still fractal, but just in a different way. On beginning level they may be more symmetric in their fractal parts but on a further out level I think they may go asymmetric again, also on stem positioning they go slightly asymmetric. Also when watching on the length end of them not each end is balanced out with each other, one is often holding far more bases than the other.


I think one reason lncRNA's looks and branch off differently to ribosomal rna, is that they don't need to be "small" rather compact globular spheres like when the ribosomal RNA large and small unit form up.
(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Even energy distribution - not the case for big Ribosomal RNA


After realizing that the thermophilus ribosomal RNA I had been looking at belonged to an extremophile that is loving high temperatures, I thought that the excess GC ratio around the multiloop skeleton in the ribosome was due to organism and its preferred environment.


While this GC enforcements of the ribosomal main roads, goes extremely noticeable in the thermophilus ribosomal RNA, I remembered that I originally did see the strong fork pattern in another ribosome. When I checked for organism, it did belong to an e.coli, which can hardly be called heat loving. :) The pattern is not as outstanding, but it is still there.


Thermophilus from the Noller lab, with strong GC enforcement pattern

https//lh6googleusercontentcom/12gmBzlPJE9JHNFwDQRD5TddE8wndRKGrIIq2a3hOs5HoysmUIYf_iz8lBtuxoslWE8kfvFa-umP5ei8VVD_o4lUaaDG_P9mifwzu9OEZEA-qR9w2yw31TF4QxGuUrHRho82TDY


E.coli from Aymen Yassin with weaker GC enforcement pattern


https//lh4googleusercontentcom/GzUYYsaF48BFH0lWMjEPKon0RuiuWcGARIj8mGkQayDUh17CTuRtrWtPHBMnL_Wy8d83jYt2jgPFtUHTprA_zaPs5rsYrQbTAM5ro75DbE9hlGwsJXotp8TeG8_MLiZowpnwOgQ



So this means that even energy distribution is off the table for ribosomal RNA, as this uneven GC distribution totally breaks pattern - with no discussion. Is it also off limits for all big RNA's, and only workable on smaller RNA's? Is there something special about ribosomal RNA?


The pattern I see in ribosomal RNA actually reminds me very much of the veins in a leaf. The main roads being thicker - and in RNA stronger reinforced.


Image by PublicDomainPhotos


I decided to draw in “multiloops” Here the earliest levels are also bigger than the lower levels. Although the leaf vein branching “algorithm” is way more predictable than RNA multiloop size.



Really I shouldn’t be surprised. :) I have already seen the GC orientation pattern for multiloops that stands strong out in a rough estimate of 80-90% of most small RNA designs (below 100 bases) breaks down when the RNA gets bigger. Only goes to confirm that some of the RNA folding rules change with size of the RNA.
(Edited)
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 122 Reply Likes
where do the strong (GC-reinforced) helices show in in 3D? Are they clustered, or do they form an extended 'skeleton' of strong bones?
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

I would say if you imagine that the small and large subunit is each their side of a rib cage, then I would imagine the reinforced areas to bend like ribs and embrace each their subunit.


I highlighted the GC reinforced areas in the simplified Thermus Thermophilus I made.


Reinforcement in 2D.



Now the thermophilus ribosome has more reinforcement than the e.coli ribosome, that I also drew on. Usually the reinforcement show up right around one or both sides of the multiloop, on the highway driving straight through the stems connecting the multiloops. I simply think the multiloops are the scaffold and the domains are simply addons.


I don’t know how to simulate a ribosome in 3D. As it is bigger than the 500 bases that Chimera allows. So I went and looked at images instead. I also have problems finding ribosome sequences for the ribosome of the organisms I want to look at. PDB only has only little RNA. :)


However I found an image that think shows what we are after.

http://www.cipsm.de/publications/research-area-c/localization-of-eukaryote-specific-ribosomal-proteins-in-a-5.5-a-cryo-em-map-of-the-80s-eukaryotic-ribosome/


So the ribs are bent and with a twist. Looks slightly like protein alpha helices actually :)

(Edited)
Photo of rhiju

rhiju, Researcher

  • 403 Posts
  • 122 Reply Likes
oh i meant just to take the 3D structure of the RNA and just color in helices that you've highlighted as GC-rich. I wonder if they are clustered in the core of the 23S rRNA, or are extended throughout like in your leaf diagrams!

Not sure how to do in Chimera, but this is pretty easy to do in Pymol if you want --  you can load in a ribosome structure like this one:

https://dl.dropboxusercontent.com/u/21569020/whole_ribosome_2b64_2b66_PTChighlight.pse

I was messing around with some coloring (for another reason), but its easy to color particular residues with a command like "color teal, resi 579-584+1256-1261+670-681+779-810". That colors the separate sets 579-584, 1256-1261, ... teal ; I think that happens to be one of the sets of helices you highlighted in DII !
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
I can't open the ribosome drop box file, I am not sure what program it takes. Pymol takes a license, where Chimera is free. But would be super cool to color things with commands. :)
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Ok, I think I understand your question better now. For the Thermus Thermophilus many of the stem stretches between the multiloops are extended gc rich. Whereas in e.coli the GC rich spots tends to cluster around the multiloops.


I found some awesome ribosome images in the paper you linked, just on top of the literature list.


I found something that is totally crazy. Large subunit from human ribosome. I have never seen so much GC base pairs in line in anything living. :)


http://apollo.chemistry.gatech.edu/RibosomeGallery/H%20sapiens/LSU/3D%20structure%20based/index.html#H_sapiens_Fine_Grained_Onion_2.png


There are also a huge amount of GC’s elsewhere too, not just in the long appendages. We are most complex organism in the image library, but not exactly an extremophile.


I found even more toys there. Ribovision is pretty cool. If I hover over any base it will tell me conservation. So I picked the Thermus Thermophilus that I’m already into and hovered over the GC heavy regions around the multiloops. The closer to the multiloop, the more conservation and there seems to be - makes sense. Also many of these GC heavy stretches seems generally well conserved. Sometimes there is a GC pair that’s flipped. The multiloop closing GC pairs generally had low entropy, which also makes sense.  


The high GC content in T. Thermophilus can probably be explained by its harsh environment and it having to tolerate higher temperature. Similar for the marismortui ribosome, as this comes from an archaea that lives in a saline environment.


Comparing melons with apples


The paper mentions that the large ribosomes have undergone more evolutionary changes than the small subunits.


“Bacterial and archaeal LSU rRNAs are composed entirely of the common core, with only subtle deviations from it. By contrast, eukaryotic LSU rRNAs are expanded beyond the common core. Sacccharomyces cerevisiae LSU rRNAs are around 650 nucleotides larger than the common core rRNA. Drosophila melanogaster LSU rRNAs are larger than those of S. cerevisiae by 524 nucleotides. Homo sapiens LSU rRNAs are larger than those of D. melanogaster by 1,149 nucleotides.”


“The differences in the small ribosomal subunit (SSU) components are more modest, with 69 additional nucleotides in the H. sapiens SSU rRNA over S. cerevisiae and 258 additional nucleotides in S. cerevisiae over E. coli (SI Appendix, Table S1).”


I realized that I have been comparing the e.coli small subunit with the large subunits of the other species. As such I have been comparing melons with apples, as there is quite a size difference between the small and large subunit.


So to fix that problem, I took a look at the large subunit of e.coli. This time I found it wasn’t so different from the thermophilus in GC content. The large subunit has a higher GC ratio around the central multiloops than the smaller subunit - the small subunit being the one I originally found the “leaf vein” pattern of GC’s. Now my curiosity was peaked.


The only thing different between the large and small subunits from what I can tell is size. Still the large and small subunit seems different in their clustering of GC’s around the multiloop centers.


This made me wonder, is this increased GC ratio size related? Let's say the bigger the RNA, the more GC content it needs at central spots to stick together?


I mean I have seen that the longer the stems become in our lab RNA, the less they need to have GC pairs. And the shorter the stems, the higher GC ratio they need. But GC content in relation with really big RNA, that I have had no feel for. So now I wonder if GC ratio by default changes with RNA size?


I wonder if there is a size component involved when it comes to GC ratio. After all the human ribosome bigger than all the others? Yup. If so that alone might explain why the heavier GC’ing is needed as to add stability to a rather big unit. Just like big elements in the periodic table have a harder time keeping their protons, neutrons and electrons together and sticking.


Are there any trends for GC ratio in relation to RNA size? If so this could very well explain why homo sapiens have its large subunit have so much GC. As it is by far the biggest of the ribosomal large subunits.


Now I wonder which natural RNA molecule is the biggest in the world? Without using proteins for reinforcement that is. :)



Fruitfly versus man


I’m also wondering about something else. It isn’t just size of the RNA of the organism and the bigger, the more GC. There is something else in play.


The large ribosomal subunit for fruit flies which is second most complex organism in that image library (drosophila melanogaster) on the other hand looks way more AU rich on average compared with human ribosomal large subunit. Also it is more AU rich than several of the smaller organisms large subunit. Not sure what to make of it. Just scratching my head. :)


I found a book (page 338) that says that man has 60% GC content versus only 40% for fruit fly (d. melanogaster).


So I don’t think the whole difference in GC ratio can be blamed on RNA size. So what is different between the fruit fly large subunit and the homosapiens? Now the large subunit for man has a lot more bases, but still fruitfly one has a lot more bases than several of the large subunits for smaller organism, that still has a higher GC ratio than the fruit fly large ribosomal subunit.


I have also been wondering about how much of the ribosomal RNA that could fold up on its own and have no misfolds without the protein. Cody says that ribosomal RNA is deeply interdependent on proteins for folding and that it is a kind of a chicken/egg problem as the RNA bit was thought to be once able to fold up on its own without the proteins.

Basically I am wondering how much of the sequence pattern we see in the ribosomal RNA, that are actually due to RNA folding and what bit is due to protein reinforcement or stress put on. Actually I would kind of have expected that the inside RNA part could relax a bit more, with all the protein around it. Again, I think the bigger ribosomes may have more protein content which I also could imagine having an effect the RNA.


I found a paper that deals with the formation of the ribosome. Turns out that MG2+ ion concentration is central for the inner core of the ribosome whereas proteins seems to take care of the outer region.


“Formation and evolution of the early PT center may have involved Mg2þ-mediated assembly of at least partially single-stranded RNA oligomers or polymers. As one moves from center to periphery, proteins appear to replace magnesium ions.”


“Mg2þ density is greatest in the core region and falls off with increasing distance from the origin (fig. 6A). In the core region, there are around 0.21 Mg2+ ions with direct phosphate interactions per rRNA nucleotide. The ratio falls to nearly zero in the outer regions of the LSU.”


MG2+ are kind of the neutrons of the ribosomal atom. :) Holding things together from the inside.


So I’m guessing that human ribosomal RNA large subunit needs the extra GC as it has more bases and that it can only compensate with MG2+ ions and protein scaffold to a certain degree. Could also explain why it needs its GC rich appendages on the outside. I’m still scratching my head about why fruitfly doesn’t need a higher GC ratio in its large ribosomal subunit.

But I think the general more heavy GC'ing around the multiloops is an additional way for ribosomes to help themselves to stick together.
(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes

Rhiju, I was reading one of your papers that is related to ribosomes: RNA regulons in Hox 59 UTRs confer ribosome specificity to gene regulation. (Open access, check here, search for the title and choose paper)


I find this bit of the paper particularly fascinating:


"To date, only a small class of viral IRES elements have been shown to interact with both the large and small ribosome subunits to form a translationally competent 80S ribosome30,31. However, a biotinylated full-length Hoxa9 59 UTR, as well as the minimal IRES element contained within nt 944–1,266, are able to pull down ribosomal proteins from both the large and small subunits, including RPL38 (Fig. 2c, d). The full-length 59 UTR also pulls down both 28S and 18S rRNAs (Fig. 2e), suggesting that the 80S ribosome is able to form on the uncapped Hoxa9 59 UTR."


Translated: Basically they found out that some messengerRNA (mRNA) had two ways to get translated.


Normally mRNA is capped, which is a way for the ribosome to check that it is translating mRNA from the cell and not some opportunistic viral RNA. Viruses have found an alternative way to overcome it. Some viruses have an Ires element which can help pull the two ribosomal sub units together and make the ribosome assemble, without the usual starting machinery that is normally necessary for translating the cells own mRNA’s.


But some mRNA’s also had that special Ires code that viruses used + a cap and despite having the correct cap, they got translated by the viral element when needing to grow body parts like skeleton.


The Ires element reminded me of this sly fellow. An octopus dragging together two coconut shells to hide itself - displaying a surprise use of tools. Here is one octopus that has perfected the art. :)


Now I think I understand why the ribosome needs to be in sub units. Proteins don’t seem to dig doing bigger switches in structure unless there is something like a pH change or a ligand binding which adds energy. It would unpractical and impossible for the cell to change pH, each time a new peptide bond was to be made. :)


Hmm, okay, proteins are mainly surrounding each of the RNA subunits, and the core is RNA. But then again, I don't think that RNA fancies doing that big switches either. So still same result. So what proteins especially, but also RNA can't achieve when alone, they can achieve together. Now it also makes sense that the core is RNA. As it is the better at performing switching when in a much smaller version, compared to protein and since the early world of life is thought to have been an RNA one. 


Earlier I have kind of been thinking about most of the ribosomal RNA as space filling RNA and protein strings attached to keep the whole thing together. I only imagined a few sections of the RNA to actually have a specific function, either by having a shape that allowed space for holding holding of a tRNA codon. However what I am starting to come to a realization off, from the paper Rhiju linked, that the ribosome has been build in layers and each layer tends to add a new function on top of the existing ones.

However I do think that some of the small hairpins on the multiloops in the multiloop highways are really just space fillers and more are there for making sure the multiloops got identity variation by having different stem count and length, so they don’t misfold.
(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Crystals, x-ray structure and the asymmetric nature of RNA


Now I wonder if the asymmetric nature of RNA is part of the explanation on why RNA is harder to crystallize compared to proteins?

Proteins are far more ordered and compact in their structure. Proteins form beta sheets, where side chains line up with each other in a repeated and regular manner. Even their alpha helices have ordered structures in themselves and even more when more of them line up side by side.

Beta sheetjpg

On a higher level, proteins also often use repeating domains/units. All of this adds up to higher orderliness. Order means denser structure and higher crystallinity, meaning it should be easier to get the structure by X-ray.

RNA rarely have bigger stem regions line up with each other. Though RNA can have coaxial stacking, where two stems line up with each other which is an adder of stability and energy bonus.

Proteins also generally contains far more symmetry compared to RNA - that's except for some very symmetric RNA switches ;). Although there are some symmetry to some higher order RNA structures, but not if one zoom in and look at the details.

RNA seems fractal in an asymmetric way. Needing variations on all levels, in particular the bigger it gets.

I suspect the asymmetric nature of RNA has a good deal to do with why it is harder to obtain x-ray structures from RNA compared to protein.

It all makes sense now. :)

Now I also wonder if the RNA designs with coaxial stacking are easier to crystallize, than similar sized RNA designs without?
(Edited)