Quad Loops from Labs: don't use a G boost?!

  • 2
  • Article
  • Updated 8 years ago
While waiting for the database to happen, I decided to do some manual data mining for quad loop patterns from successful labs. I restricted my search to lab results that scored at least 95. I took copies of both quad loops that succeeded and failed including the closing pair and backing pair of the attached stack.

I found 245 instances of quad loops, including 42 different patterns. Only 10 of these patterns had at least 5 instances. Of those 10 patterns, there were some radical failures, most of which included the use of a "G" boost point.

UCGAAAGA was tried 27 times, and failed in 27 out of 27 cases. On the other hand, UCAAAAGA (without the G boost) was tried 40 times and succeeded in 29 cases (a 73% success rate).

AGGAAACU was tried 21 times, and failed in all cases. AGAAAACU, on the other hand, succeeded 18/22 times (82% success).

ACGAAAGU was tried 14 times, and failed in all cases. In contrast, ACAAAAGU succeeded 18/27 times (67% success).

CGGAAACG was tried 6 times, and failed in all cases. Meanwhile, CGAAAACG succeeded 14/15 times (93% success).

UGGAAACA failed 12/15 times (only 20% success), while the non-boosted UGAAAACA succeeded 9/9 times (100% success).

In addition, the CC?AAAGG pattern was used twice each with and without a boost point.

CCGAAAGG was tried (and failed) twice, while CCAAAAGG was also tried twice and succeeded both times.

While none of these patterns was tried at least 50 times there does appear to be a strong indication of a trend that adding a solitary G boost point makes a quad loop *less* likely to form properly.

There were also a few patterns that were used 1-3 times an succeeded in all cases which might be worth looking at more closely.

The data for these quad loops is in a Google Docs Spreadsheet.

Update: Eli notes this is the data version of Mat and Jee's graphical summary
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 130 Reply Likes

Posted 8 years ago

  • 2
Photo of Quasispecies

Quasispecies

  • 100 Posts
  • 9 Reply Likes
How did you define success and failure?
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 130 Reply Likes
Success if all of the (( and )) nucleotides were reported as bonded and all of the .... nucleotides were reported as unbonded in the matching lab results. The SOURCES sheet in the Google Docs Spreadsheet shows the data as reported in the lab results for the selected quad loops. 1 means reported as bonded; 0 mean reported as unbonded. Glitches is the count of nucleotides that are not as expected. Glitches>0 for a quad loop is considered a failure; glitches==0 a success.
Photo of Quasispecies

Quasispecies

  • 100 Posts
  • 9 Reply Likes
The failure-prone loops you've identified are mostly GNRA tetraloops, which are stabilized by pairing and stacking of loop bases. Perhaps data are telling us that certain loops are more ordered than others, rather than actually misfolding.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 130 Reply Likes
From here:

"GNRA tetraloops (N is A, C, G, or U; R is A or G) are structural motifs that form basic building blocks of RNA structure that often interact with proteins or other RNA structural elements."

So are you suggesting that we should be looking further down the attached stacks to see what base pairs help to stabilize the GNRA loop? Or is something else up with the data? The 100% failure on most combinations does not seem promising!

Some GNRA cases that were reported to work:

UCGGAAGA 2/2 times.

GUGAGAGC, GAGUGAUC, GAGAGAGC, ACGGAAGU 1/1 times.

GCGAAAGC 1/2 times (50%)

UGGAAACA 3/15 times (20%)

The SOURCES tab includes the PuzzleIDs of the shapes these were from.
Photo of Quasispecies

Quasispecies

  • 100 Posts
  • 9 Reply Likes
I think "success" and "failure" might need to be defined differently for certain tetraloops. The signal of a properly folded GNRA tetraloop may not be the complete absence of pairing. Here's a thread on that topic.

The chemical mapping technique used to score our designs measures the reactivity of each nucleotide's 2' OH toward a certain chemical. Nucleotides in flexible regions spend more time in positions where the 2' OH can react. Loops are generally flexible, but the GNRA loop is probably constrained by h-bonding/stacking.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 130 Reply Likes
So in your older thread you suggest that, in a GNRA tetraloop (what I've been calling a quad loop), it may be OK for all but the N nucleotide to be marked as bonded.

I'll go back and relook at the lab results with that in mind.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 130 Reply Likes
When I allow the G+RA nucleotides of a GNRA loop to be reported as bound, and still consider the loop as success, the data does not look as bizarre. In fact, the G-boost appears to help.

UCAAAAGA 73% (29/40) UCGAAAGA 89% (24/27)

ACAAAAGU 67% (18/27) ACGAAAGU 86% (12/14)

AGAAAACU 82% (18/22) AGGAAACU 95% (20/21)

CGAAAACG 93% (14/15) CGGAAACG 100% (6/6)

UGAAAACA 100% (9/9) UGGAAACA 100% (15/15)

So, there definitely seems to be an issue where G+RA nucleotides may be being reported as bound in a GNRA tetraloop even if the loop formed correctly.

How might this be affecting the scoring of lab results?
Photo of hoglahoo

hoglahoo

  • 141 Posts
  • 39 Reply Likes
"How might this be affecting the scoring of lab results?"

I got here from following links starting from a suggestion from jeehyung. Lots of info most of which is over my head (but very interesting and I will go over it until I understand) but I am still a little unclear on this point of scoring lab results
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 130 Reply Likes
I believe the gist of it is that tetra-loops are no longer penalized in scoring for seemingly having bonded nucleotides within the loop, but they still show up as (shades of) blue in the SHAPE data. The reason behind this is the non-standard bonds that occur in such end-loops which help to stabilize the loop and don't leave the nucleotides 100% free to bond with the SHAPE assay test molecules. So ultimately, the geometry of the end-loop does not look like the simplistic 2D open loop shown in the game, and the SHAPE data reflects that. BUT, my understanding is that the scoring no longer penalizes such bonds.
Photo of hoglahoo

hoglahoo

  • 141 Posts
  • 39 Reply Likes
got it. I think

Photo of Edward Lane

Edward Lane

  • 139 Posts
  • 8 Reply Likes
Would this typically be consistant with a non canonical "GA pair" ?

If so might it be worth giving G and A the option to pair in the eterna model, but count as if unpaired with regards to whether the shape was 'white and stabilised', and maybe giving them a low arbitrary value (equivalent to the boost energy value?) for the 'non canonical pair' ?
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 130 Reply Likes
Yes Edward, it could be seen as a GA bond. bioinfosu.okstate.edu has a page on GNRA tetraloops, but this wikimedia picture shows it well:



The G bonds with the A and one site on the R, only leaving N free. However even that is somewhat constrained by the tightness of the loop.