Odd Pattern in Synthesis Data

  • 1
  • Article
  • Updated 7 years ago
In several otherwise successful lab designs, a single base in the interior of a stack appears inexplicably unpaired:



In some cases, this cannot be dismissed as the RNA folding into a different structure. The SHAPE data are best satisfied by the target structure, but they show certain nucleotides within stacks are unusually accessible. For want of a better term... they're "floppy".

I haven't looked at all of the past labs or their synthesis data, but I did browse it to look for a few obvious examples to share. There seems to be a pattern in this (admittedly unscientific) sample. Uracil seems to be the most inclined to be "floppy," "floppy" bases tend to be located at the same place in a given structure, and the "floppy" uracils tend to appear in the sequence 5'-CUR-3' (where R is A or G). I have no idea if this means anything, but it's interesting.



Could it indicate some sort of tertiary interaction? Maybe a "floppy" uracil's corresponding adenine is participating in a tertiary interaction (A minor?) that makes the uracil residue appear unpaired. Consider the two nucleotides indicated by red dots below:

Photo of Quasispecies

Quasispecies

  • 100 Posts
  • 9 Reply Likes

Posted 9 years ago

  • 1
Photo of rhiju

rhiju, Researcher

  • 416 Posts
  • 125 Reply Likes
Thanks for the super careful analysis! I wanted to note that there are occasionally single residues at which both the SHAPE and control (no SHAPE) reactions yield a strong band that does not subtract to zero due to experimental noise. These are typically due to minor impurities or so-called 'stops' where the reverse transcriptase that copies the RNA into fluorescent DNA gets held up due to difficulties in unraveling RNA structure.

We are working on a better way to represent such 'uncertain' bands, but until then it may be best to focus analysis on regions where multiple bands in a row are off.
Photo of Quasispecies

Quasispecies

  • 100 Posts
  • 9 Reply Likes
Thanks, rhiju. That explains a lot, especially why these single unpaired bases tend to concentrate at specific points.

The polymerase probably gets hung up on the similar secondary structures, since we're all shooting for the same target shape.
Photo of Brourd

Brourd

  • 482 Posts
  • 87 Reply Likes
So, is this issue still a problem with the new synthesis protocols? If we look at a design like The Last of the Huffmen from the lab Huffman





We can see that base 14 in this design is obviously not protected from chemical modification. This pattern was actually quite common in this lab



With a large number of bases being chemically reactive compared to the other side of the "stack" they would be predicted to pair with.



Strangely enough, a large number of the designs also have a reactive guanine residue at base 52. Could these be related in some way?
Photo of rhiju

rhiju, Researcher

  • 416 Posts
  • 125 Reply Likes
have you checked the raw data in the RMDB? At some residues we see very high backgrounds and they don't subtract well. For those residues, however, there should be a high estimated error, as the data involve the subtraction of two large numbers that themselves have high errors. The RMDB data have an extra field "REACTIVITY_ERROR" -- can you check and see if they are anomalously spiked at those residues?
Photo of Brourd

Brourd

  • 482 Posts
  • 87 Reply Likes
As a start, I decided to check the design "The Last of the Huffmen" to determine if there was a spike in the reactivity error.

EteRNA Round 72

ANNOTATION_DATA:136
MAPseq:design_name:The last of the Huffmen - Brourd - Lab 'Huffman' R1 - Sub 3 (Mixed Things up a little bit)
MAPseq:project_name:Huffman by hotcreek
MAPseq:ID:2713681

signal_to_noise:medium:2.309
EteRNA:score:EteRNA_score:92.0
EteRNA:score:min_SHAPE:0.050
EteRNA:score:max_SHAPE:0.550
EteRNA:score:threshold_SHAPE:0.300

According to the RMDB,

Base 14 had an assigned reactivity value of 3.2749
Base 14 had an assigned reactivity error of 0.3716

This reactivity error is higher compared to the others in the same sequence, and may be an anomalous spike (not sure how high the estimated error needs to be to be defined as such).

Base 52 on the other hand, appears to simply be a product of various forces being exerted upon that C-G closing base pair.

Base 52 had an assigned reactivity value of 0.6857
Base 52 had an assigned reactivity error of 0.1559

Most likely, this guanine residue is reactive due to the multiple branch loop in that particular design. Combined with a low SHAPE threshold, it is unlikely to be shown as "protected" in the game.

Further investigation shall be made into the other designs of this lab, and perhaps a project shall be made to investigate base 52, which could be exposed due to the asymmetric nature of the multiple branch loop that base pair 20-52 closes, the length of the preceding stack, or a combination of both factors.

Both the reactivity and error values for this design are available in this Google spreadsheet, with bases 14 and 52 highlighted in yellow.

Last of the Huffmen Reactivity Values

Now, referring to what Rhiju posted in his original response to Quasispecies post.

"We are working on a better way to represent such 'uncertain' bands, but until then it may be best to focus analysis on regions where multiple bands in a row are off."

These bases where there may be a high estimated error, is the development team still working on a way to represent these 'uncertain' bands in the game?

I suppose when players are handed some control over the scoring of designs for their projects, these types of issues can be overlooked in the final score.