Analyzing the Ribosome Challenge pilot round results

  • 2
  • Article
  • Updated 2 days ago
The first experimental data for redesigning the ribosome is here! What can we learn from it?
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 989 Posts
  • 310 Reply Likes
  • excited

Posted 2 months ago

  • 2
Photo of Brourd

Brourd

  • 454 Posts
  • 83 Reply Likes
As a minor aside, I believe it is slightly misleading to include in the news post "As you can see, ten of our designs actually improved on the wild type." Part of the scientific process is to not force bias on to the players. A better way to say this would be "And X number of designs had comparable (or perhaps even better) activity compared to the wild-type in the synthesis of sfGFP." I could make a strong argument that several of the so called "improved" designs are within experimental error of the wild-type based on this assay. With that said, tools that you want to give players include:

Phylogenetic tools: So for each design, you need to include the mutations made for the RNA. It would even be cool as the experimental throughput increases to begin mapping phylogenetic trees of the distance these sequences have from the WT ribosome.

Viewing Options: I would even say that it would be interesting to see if you can incorporate a separate view in the of the ribosome in its 2-dimensional structure with the mutations easily highlighted. While unlikely, it would be even better to include 3D views of the ribosome crystals/Cryo-EM structures, with views of the mutations in their local environment.

Data: The ability to synthesize a single protein is probably not a good measure of robustness or betterness of any design compared to the WT ribosome. Similar to how selection processes work with development of aptamers and in vivo evolution of proteins, it could be that whatever players did made the ribosome really good at the synthesis of a specific protein or possibly even peptide sequence (unlikely, but it's something players need to remember when making mods of designs in the future). I'm sure that they're looking at taking the best of the best player sequences and transforming the plasmids into ribosome deficient E. coli cells, so this is definitely a comment about player conclusions that are made, but an important one to keep under consideration.
          As an additional note on the data, it could be useful to include a numerical "relative change" score compared to the WT ribosome. It's not helpful to players to throw easily misinterpreted bar graphs and sigmoidal curves at the player. I would recommend a numerical score of more easily and summarized way to show success or failure, and error associated with that.

QOL Changes: The link to the lab needs to be in an accessible place. It needs to be in the news item, and links to each design should be in the news item as well.
Photo of Brourd

Brourd

  • 454 Posts
  • 83 Reply Likes
As an addition to phylogentic tools, if you can list the accepted domain or parts of the RNA the mutations occur in as well, that would be incredibly useful.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 989 Posts
  • 310 Reply Likes
Thanks, Brourd. Your point about not over-generalizing by ignoring the error bars is important.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 989 Posts
  • 310 Reply Likes
... and I've added the link for the project page (https://eternagame.org/web/lab/9162726/) to the news item now. Thanks for pointing out it's absence.
Photo of SK_Magic_Malaysia

SK_Magic_Malaysia

  • 1 Post
  • 0 Reply Likes
very good point and useful thank you
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
I have added on the number of mutations onto the data images from the news post: 


https://eterna.s3.amazonaws.com/labs/news_blog_images/Bildschirmfoto+2019-09-13+um+09.37.54.png

Some of the very best designs indeed are among those that have fewer mutations. The 5S doing best has the fewest mutations. 

Some of the designs doing worst have many mutations. 2 out of 3 of the 16S design that are doing worst, have 55 mutations. 

However the happy surprise is that it seems that even designs from the 16S lab with a good bunch of mutations are actually doing good also. But I do wish to highlight that Jieux design with just 4 mutations is doing really well. 

I have made a spreadsheet for quicker overview of the actual mutated bases in relation to their designs. 

5S and 16S mutation count and mutated bases





Photo of Astromon

Astromon

  • 194 Posts
  • 27 Reply Likes
Some of the designs doing worst have many mutations. 2 out of 3 of the 16S design that are doing worst, have 55 mutations.   This is true but the  one with few mutations had Violations.
Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
Good catch, Astromon, the Jieux design has 3 violations. All 4 mutations are in the same hairpin. Jieux notes he chose this hairpin to address a global misfold.
(Edited)
Photo of Astromon

Astromon

  • 194 Posts
  • 27 Reply Likes
True , I think with the so few amount of moves is why i think this design with violations did well.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 130 Reply Likes
I'm not surprised that so far we are only seeing small bumps in performance (modulo error bars which indicate it could be effectively the same) or larger drops in performance for the designs that do more poorly than WT. Evolution has had roughly a billion years (give or take) to tune the ribosome. It's probably far easier to break it than improve it at this point. That most of our selected designs seem to perform similarly or perhaps slightly better than WT is promising. Plus this is definitely a new/different lab than we have run before. It may take us a few rounds to figure what optimal "tuning" strategies might be and to find some tweaks that evolution has missed so far because they involve more than just SNP changes, or to find multiple SNP changes that are significantly additive. (Any single SNP change that by itself has a major impact has probably already been tried and incorporated in some gammaproteobacteria variant, but not necessarily in E. Coli.)
Photo of Astromon

Astromon

  • 194 Posts
  • 27 Reply Likes
heres is some more data for the 16s results graph, i included number of violations and delta readings for each design.
<img src="https://prnt.sc/p7xuqu" height="600" width="600">
Photo of Astromon

Astromon

  • 194 Posts
  • 27 Reply Likes
Photo of Astromon

Astromon

  • 194 Posts
  • 27 Reply Likes
linearFoldv was the engine used for the Delta readings
Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
The phylogenetic (IUPAC) constraints seem to be working well. My design ignored those constraints and failed. If 23S results confirm what we are seeing, what are possible next steps for creating better designs? Focusing on lowering the energy delta created a slight improvement, but it seems like we are supposed to be trying for a much more significant performance improvement.

My first general thought (and please disagree if I am way off base) is to find designs that fix global misfolds while honoring IUPAC constraints. This could mean 4-8 mutations to the 16S that address one or two global misfolds, or it could be 10-20 mutations needed to address multiple misfolds. Actually, from what I experienced, it might take 15 mutations to correct one global misfold.

My second general thought is to explore phylogenetic covariation, looking to see if certain phylogenetic mutations occur in tandem with each other frequently across rRNA variants in E. coli.

More specific approaches: the 16S central spine, redesign pseudoknots

Omei, any clue as to when we get 23S results and any estimation when Round 2 puzzles will be up?
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 988 Posts
  • 310 Reply Likes
The current prediction for getting 23S results is 1-2 weeks to get the DNA plus 1-2 weeks to do the experiment. But keep in mind that past predictions have been overly optimistic about the challenges.
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
Mutation calculation sheets


Recently jandersonlee took the sequences from a couple of the POTW puzzles and played around with it. He created a google sheet that can calculate the number of mutations against the puzzle starter sequence. Plus if mutations are in stems or single bases. 

I think we can use things like this a tool for future analysis in lab as it can easily be fitted to new datasets. Plus more functions added on as we figure we need them. So hereby I present the 5S and 16S lab data, jandersonlee style. He made all the formulas, I just ripped them off. :) He even helped me with creating the last big chart for the 16S data and fixing a few errors. I hereby pass it on. You can make a copy of the sheet and play around with it as you like.  

5s and 16S rRNA from Jeff's RibosomeChallenge1.2


Here are some images from the sheet: 





As you can see for the 5S rRNA design we mutated mostly in the loops (red): 




Another mutation pattern turned up for the 5s design. Generally we have a lot of bases mutated to A. Plus in general stronger bases in the original 5S ribosome sequence mutated to weaker bases.  





For 16S we are more all over the place in relation to mutating stems and single bases. 








(Edited)
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 130 Reply Likes
https://docs.google.com/spreadsheets/d/e/2PACX-1vQOp3Xz7ki3bSPa1ql62K4e-eeaysm-kQPAaxaYK5RJjOUvlau23...

It seems that the majority of mutations in the "better" 5S designs were to unpaired bases.
Photo of Astromon

Astromon

  • 194 Posts
  • 27 Reply Likes
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 989 Posts
  • 310 Reply Likes
This would be gruntwork. But would someone (or someones, working together) be willing to make 2 spreadsheet (for the 5S and 16S) that took a diagram like <img src='http://rna.ucsc.edu/rnacenter/images/figs/ecoli_16s.jpg'>  and created a spreadsheet that listed the base positions (1 to 1542) in one row, with a second  containing the helix number of each base position that was paired. Something like this would be the starting point for any tool that mapped mutations or secondary structures to specific heilices.


Photo of Brourd

Brourd

  • 454 Posts
  • 83 Reply Likes
So you want the helix number (going from 5' to 3') for each base? Do you have the secondary structure in dot-bracket notation of said diagram available? I believe I have a parsing script that should be able to do this (unless somebody has access to a script that is more readily available).
(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
Here is a spreadsheet for 5S and 16S.

16S (Noller) and 5S (Gutell) Helix sheet

You can use it to quickly locate what helix your base is in without having to look at an image.  

It contains: 

Base positions - Base numbers

Helix number - Helix numbered according to primarely Noller's 16S ribosome image, with a detour to Ribovision and Gutell)

Helix closure -  All bases inside a helix counted as belonging to the helix, including internal and end loop

Mismatch - If a "base pair" is a mismatch and what kind

Dot bracket - Taken from the game. 

Bases - Taken from the game

Links - To the images the helix numbering was found from 

In the future I dream of having the option to do things like light up helix 33 and be able to see its contours glow in the puzzle. 

Until then we have a script that can make any specific base you wish highlighted spark. 

https://eternagame.org/web/script/9309462/
(Edited)
Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
Great spreadsheet. This is a bit off-topic, but since Eli mentions dreaming of glowing helices, I'll mention that I was frustrated trying to navigate the 23S and see if anyone has more ideas on a new tool or navigation approach.

The problem is when two or even three helices/stems overlap in the puzzle. Tracking the path of the sequence and trying to see the pairs in helices is very time-consuming. Even with the Explosion Factor often I couldn't see which pairs were involved in which helices. I wanted to grab the helices and pull them apart. A 3D model would help but that doesn't seem to be on the table. I'd like to be able to make a helix disappear so I can see what is beneath it. (I plan to print out the 16S and 23S diagram to help navigate the secondary structure for the next round.)

I also would like an easy way to see which strand a given strand is mispairing with. It's time-consuming to switch to Natural mode and track where a strand is misfolding. We can highlight in blue, is the current solution and certainly doable, and it looks like Eli's new script does this too. I use the ArcPlot booster to identify the general area of global misfolds. But if I could click (hotkey) on a base in Target mode and see what number base it is currently pairing with, that would make navigation quicker.
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
@DigitalEmbrace, I really like the idea with an option ot see what a specific base is paired with. Even just what number a specific base is, as locating which base is what is difficult in the larger puzzles. 

Minor correction, the sparkle script is by Omei. 
Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
On my first idea, I'll add that the visual density of the 23S not only makes viewing bases/colors challenging, sometimes clicking the desired base took a bit of maneuvering with the Explosion Factor. So my ultimate fantasy is to be able to deactivate layered helices in order to isolate the desired helix.

And to clarify my second idea, I want to hotkey a given base and see the number of the nucleotide (NT) it is currently mispairing with, not which NT it should be pairing with.

@Eli Are you exploring the possibility of linking the spreadsheet to the puzzle?
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
@DigitalEmbrace, thx for your additional explanation. 

I will add the links for the puzzles in the spreadsheet. 
Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
I meant literally link the spreadsheet to the puzzle, the way the ArcPlot tool is. Not right now but maybe down the road.
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
A base and it's partner


Jandersonlee took the 16S (Noller) and 5S (Gutell) Helix sheet and broke the data up in a new way for the 5S lab, plus added a lot of new functions. I played the copycat and continued for the 16S lab. 

He said that I should let you know that: "...that some data is manually entered and should be checked. For example the helix numbers and mismatched pairs don't always exactly match with the structure. I've highlighted a few questionable cases in red."

Also be aware that the structure in Nollers ribosome image don't always match up with the dot bracket structure from in game.


5S and 16S Helix Map




New spreadsheet functions:


IUPAC - The entire sequence in IUPAC, with information on which bases can be changed. 

Pairs with - you can now see which base a base is paired with. (As per @DigitalEmbrace's wish - while not in the game yet, this is a start.) jandersonlee has added this manually for both sheets. (manually entered) 

Base Cons - conserved bases. Based on the IUPAC sequence that holds information on if other close relatives to our e.coli ribosome (gammaproteobacteria) allows mutations at a specific base. The IUPAC sequence is what is used to give us the colored rings in the ribosome lab and puzzles and show what changes nature has already approved of in e.coli relatives. 

Fixed - Fixed bases that don't have other base placing options. 

Pair Base - Show the partner base to the sequence base

Pair Cons - Show if the partner base is conserved. 

Better Mods - Holds information on which bases were changed and how many. (manually entered)

Worse Mods - Holds information on which bases were changed and how many. (manually entered)

Player designs + their sequence - Sorted in the order Better (left) and worse (right).


Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 130 Reply Likes
Minor tweak: I had intended "Cons" as an abbreviation of Constraint (not Conserved) but I guess both are appropriate. The IUPAC sequence is the mutation (soft/hard) Constraints based on base Conservation across known Gammaproteobacteria rRNA sequence samples. If a base is seen often enough (1% or 2%, I think was what we settled on) across aligned rRNA samples, it contributes to the IUPAC base-conservation mutation constraints for that sequence position. So if a base is "fixed", it is found to be mutated less than 1% of the time to any given alternative at that offset.
(Edited)
Photo of Gerry Smith

Gerry Smith

  • 77 Posts
  • 43 Reply Likes
In 5S, 2.08 had a long tail.  This design was the only one to change NT36 from a C to A (to break up four adjacent C's).  I wonder if making that same change to 2.03 and 2.05 would make their tails longer?

https://getsatisfaction.com/eternagame/topics/analyzing-the-ribosome-challenge-pilot-round-results
(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
There has been a lot of talk about error bars lately, due to our recent lab data.

Astromon asked what does these t's mean?  


https://eterna.s3.amazonaws.com/labs/news_blog_images/Bildschirmfoto+2019-09-13+um+09.37.54.png


Detonators... ;)


Joke aside. If you wish to learn more on error bars, here is what I have found.



Humorous walkthrough of the dangers of ignoring error bars. 


THE IMPORTANCE OF UNCERTAINTY by Chris Holdgraf



Standard Error by Bozemanscience 










(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
This is not the first lab where we have gotten error data back for. Perhaps you remember fold change error from recent labs. 

I found out that Chris Holdgraf who wrote the funny explanation on error bars that I mentioned in the post above, continued the fun by going into more details: 

WHAT ARE ERRORBARS, ANYWAY?


Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 989 Posts
  • 310 Reply Likes
I talked to Antje about how she calculated these specific error bars, and here's what I found.

She repeated the ISAT experiment on each of our designs three times, and the fidelity factor she is reporting is the average of the three experiments. In her display of the results, the distance between the top of the fat grey or yellow bar and the top of the thin black T on top represents the standard deviation of the three trials. This is closely related to the standard error of the mean (as explained in the video above) but not the same.

But for most non-statisticians, there is a more intuitive way to express this than talking about the standard error of the mean, called the confidence level. Imagine that the thin T bar is mirrored below the average (as well as above it.) The level of confidence that the actual "relative fidelity" falls somewhere in the range between the two extremes is very close to 90% (in this specific case). Thinking in terms of betting, if you were to place a bet on each design that its actual value was somewhere between the lower and upper limits, you would win the bet about 90% of the time.

Note that the relationships among the standard deviation, the standard error of the mean and the confidence level will vary based on how many times the experiment is replicated. But if there is a consensus among players that the confidence level is more intuitive than the statistics the scientists usually use, I think that Antje would be happy to prepare graphs that way when they are targeted for us.
(Edited)
Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
I like the graphs the way they are. (Actually, I would prefer a numerical table.) We do need the range of experimental error, if known. What is our goal relative to fidelity, 2.0? 3.0?
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
@omei, I would love to see an example of our data presented in the alternative way you describe. I think this would help us determine what makes most sense to us as players. 
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 989 Posts
  • 310 Reply Likes
Actually, what I was suggesting is not a change you would notice visually. it is simply an agreement to consistently set the error bar height using a convention that isn't typically used in scientific papers, but which is easy to explain to non-scientists without using any statistical jargon.
Photo of Brourd

Brourd

  • 454 Posts
  • 83 Reply Likes
Confidence intervals are used in publications, but they're generally poor methods to describe data unless you have a high confidence value. i.e. if I want to say there is error for a result, stating that I'm 90% confident that the true value exists within a range if that experiment is repeated is not going to instill external confidence in said result (especially if it's a large range). It also requires players to understand what a 'true value' is, which I would say falls into the same level of comprehension as standard deviation.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 989 Posts
  • 310 Reply Likes
A non-statistician might not be able to state (or even understand) the statistician's meaning as the expected value of a random variable that represents the probability distribution of all possible experimental outcomes. But I doubt there are many players who feel that the words "actual value" as I used it above is a non-intuitive concept that needs further explanation.

If the rationale for drawing standard error bars instead of something that players can easily interpret is to make them more confident of the results, that's deception, not science. If you're simply arguing for smaller error bars, we could use a 65% confidence interval instead of 90%
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 130 Reply Likes
Personally I prefer 95% as a minimum confidence level, and yes it shows you how rough the data/estimate is. At 65% the error bars are smaller but it's also more likely the true value is outside of them.
Photo of Brourd

Brourd

  • 454 Posts
  • 83 Reply Likes
Jandersonlee is right, that preferably you would want to use 95% CI (or higher). But with so few data points, 95% CI would be a fairly large range. Using the confidence interval requires knowing that your expected value may have an equal probability of being any value in that range. Furthermore, it does not necessarily mean that for all future experiments, that your expected value will be within that range. Each experiment will have its own confidence interval associated with it. Furthermore, it does not provide any information about the distribution of the measurements in the experiment, or on the precision of individual data points.

In reality, the choice of how you want to display error is personal preference and typically the field of study. You can calculate every form of error and give it to players, but I would say that the best method would involve giving the player a numerical score and a range that is normalized to the measurement.
Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
From a player perspective, I look at both tables for the 5S results and see little difference among the results, the designs all performed about the same. Then I look at 16S and see two designs score 1.15-1.25 in comparison to the WT 1.0-1.1, and think those two designs did improve on the WT by a solid 10%.
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
Script for highlighting a specific helix


jandersonlee has updated his previous script. Now it can highlight a range of bases. 

Let's say you want to highlight the bases in helix 10 in the 16S rRNA puzzle.




You can find the basenumbers related to helix 10 in this spreadsheet: 

5S and 16S Helix Map

The bases in helix 10 are 198:207,212:219. 

The script is called: Report/Mutate/Mark/Unmark Bases (v1.1)

To run the script, make your own copy and save it as a booster. Then pull the script from your booster list and enter the bases, you wish to highlight. 








Photo of Gerry Smith

Gerry Smith

  • 77 Posts
  • 43 Reply Likes
Comparing the results of these two designs was interesting.  2.18 had much lower results than 2.21, despite having better Delta and a lower number of mutations.  So I wonder if the misfold improvements made by 2.21 were better....

I highlighted the areas where 2.21 did better at correcting misfolds than 2.18.  Perhaps these areas are more critical?


(Edited)
Photo of Brourd

Brourd

  • 454 Posts
  • 83 Reply Likes
A better way to analyze this would be to compare where different mutations were made. It's possible that any mutations to loop sequences resulted in a different tertiary structure, etc. which would definitely not be captured just by comparing secondary structure.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 988 Posts
  • 310 Reply Likes
@Jerry, I don't know why Brourd makes such categorical statements. Please keep doing what you are are doing.
Photo of Gerry Smith

Gerry Smith

  • 77 Posts
  • 43 Reply Likes
I'm glad he does.  I prefer strong views, especially from folks such as Broud.  I will try to incorporate both...and will look forward to hearing how to improve.
Photo of Brourd

Brourd

  • 454 Posts
  • 83 Reply Likes
Initially comparing the secondary structure computational predictions is (imo) a weaker form of comparative analysis for the ribosome. We have some idea of the WT structure, so a better form of initial analysis would be to look at the mutations for each design, and then look at where they are on the WT ribosome structure. I would say it is better to assume that your ribosome sequence is folding into the same structure as the WT ribosome, and then say "where are these mutations. Are they localized to canonical helices, noncanonical helical forms, or disordered loops? Are those residues solvent exposed, or localized to the core?"

If your ribosome based on the WT structure should not plausibly result in some perturbation to the global structure, *then* the focus should be on the predicted secondary structure. How would those misfolds alter your global structure, etc. Comparing the computational predictions for misfolds is going to be misleading, because they may not necessarily correlate to improved activity, and they are difficult hypotheses to either prove or disprove. The sequence for 2.18 may have mutated a residue (or series of residues) in the PTC, which then caused the decrease in activity, and may have absolutely no correlation to the predicted misfolds we see.

I simply point these things out because these are things that should probably be included for players to see in the UI. It would be rather neat to have an alternative view with the ribosome domains overlayed and colored, with the mutations to the sequence colored in. ^_^
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 989 Posts
  • 310 Reply Likes
Here's Antje's updated results for the 5S and 16S that I referred to in the news post. She did more replications this time, and says that this data should completely supercede the original data. (The original experiments did include the PEG and DTT additives, so the experimental conditions didn't change.)




Photo of Astromon

Astromon

  • 194 Posts
  • 27 Reply Likes
Could we get another key code to know which design is which for 5s and 16s?
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 989 Posts
  • 310 Reply Likes
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 989 Posts
  • 310 Reply Likes
One of the first things I sometime do when getting new experimental data is to do what is called a "hierarchical cluster analysis" on the results. I did this for the 23S data (with additives) and here is what I got. (Ignore the blue, green and gray numbers, which I added manually, for the moment.)



Here I chose to compare the predicted structures of the designs, rather than their sequences. The same technique can also be used on sequences, which can give a different perspective.

For those who are not familiar with hierarchical clustering, it's a way of organizing the designs to give an overview of which designs are the most similar (in this case, in their predicted structure) and which are most dissimilar.

The analysis starts with all the designs grouped together and decides how to divide that group into two subgroups to best minimize the differences within each group while maximizing the difference between the two groups. In this case, it determined that 1.18 was more different from all the others than any of the others were with each other, and so it split out that design from all the others.

At the second step, it calculated that spliting out 1.20 and 1.24 as one group and leaving all the other designs in the other group was the best choice. Since there were only two designs in the first group, there was nothing more to split there, but there were six designs in the second group, so it next worked on that group. It continued in this manner until all the groups had either one or two members.

Looking at the final graph, we see that 1.22's predicted folding was the most similar to that of the WT, and that 1.17 was closest to the those two. Designs 1.19 and 1.23 were closer to each other than to their three closest neighbors, and so on up the tree.

The cluster analysis, by itself, doesn't say anything about the experimental results; it is just a structure that can be useful for organizing the data in a way that prompts interesting questions. In this case, I manually annotated each design number with three numbers derived from the the data, as described in the key.

For example, 1.20 and 1.24 are close in predicted secondary structure, but extremely different in results. Why so different? In this case, there is an obvious candidate -- 1.24 was the design that pretty much ignored the IUPAC constraints reflecting what nature has found works for E coli's close relatives, the gammaproteobacteria. So this is not a surprise, but it is nice to see a vote of confidence for the relevance of the constraints.

Looking for another pair showing similarity in structure with differences in results, 1.22 and the WT stand out.  Design 1.22 didn't violate any of the constraints. What aspect of the difference with the WT did cause the problem? That's an interesting question, and I don't have the answer. Any theories?

Anyway, this is meant as just an illustration of why I find hierarchical clustering useful. The two tools I used for this analysis are the RNAFold website for the (Vienna 2) structure prediction and R/RStudio for the cluster analysis. The RNAFold website interface is easy to learn, while R/RStudio is not quite so easy. (But if you're comfortable making CSV files, I have written notes for how to turn such a file into a cluster diagram like the one above.) If anyone would like help in learning to use these tools, this is as good a place as any to ask questions.
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
I find this analysis interesting. 


You asked: "Looking for another pair showing similarity in structure with differences in results, 1.22 and the WT stand out.  Design 1.22 didn't violate any of the constraints. What aspect of the difference with the WT did cause the problem? That's an interesting question, and I don't have the answer. Any theories?"

I played around with the RNAFOLD website. I ran the original sequence and 1.22 through. 

Entropy of 23S original and 1.22



Original:
http://rna.tbi.univie.ac.at//cgi-bin/RNAWebSuite/RNAfold.cgi?PAGE=3&ID=Xly0hVpe9l

2.22: 
http://rna.tbi.univie.ac.at//cgi-bin/RNAWebSuite/RNAfold.cgi?PAGE=3&ID=Kg6LDV8wvE

When I flash back and forth between the two sited, it apears that Gerry's design show a bit lower and stabler entropy.


When I compare the two structures in Vienna2, what seems to have happened is that the one GC pair that is changed into an AU pair in Gerry's solve seems to keep the structure. (Green ring) He changed two other basepairs from GU to AU. (Red rings) Something I have been advocading. (Sorry :) ) In this case what seems to have happened is that the sequences in those two stems have wandered off and have made a section of the ribosome pair up in a new manner. 

Comparison of 23S WT and 2.22. 



Now this is only a single case. But if I'm to draw anything from it, it may be that it is perhaps safer changing a full base pair than just a single base in a base pair. Structuralwise. 

There is one more case. My 2.17 design with a single base change (From GC to GU). This base change seemed to have caused a great structural change.

I'm considering if it would be helpful to have such a structural difference meassure inside the game while designing, that shows potential structural difference to the WT.

I have found myself wishing that it was possible to lock the engine in the ribosome puzzles. When I open a new design from the lab list, the engine resets to the default LinearFold. But when I want to watch designs in Vienna2, I want to stay in Vienna2. 

Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
Why do you want to use Vienna2 and not LinearFold? I thought LinearFold was better.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 989 Posts
  • 310 Reply Likes
LinearFold is definitely better in the sense that it is much faster, and its advantage grows as the RNA get longer. But it achieves this speed advantage by not considering all possible foldings, particularly those that involve pairings formed by bases that are far apart in numbering.

Its authors point out specific cases where it seems to get things right when Vienna2 doesn't, but that's true for all the folding engines. I'm not aware of any evidence that Linearfold should be expected to give results better, in general, than Vienna2 for the ribosome.
Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
The Arcplot tool provides a different view for evaluating structure similarity. Here are WT and 2.22 in LinearFold.



Only a very minor difference, although losing that small fold could be enough to cause a problem.

And here are WT and 2.22 in Vienna2.


2.22 eliminated a small local misfold but added another global misfold.
(Edited)
Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
Continuing Eli's line of thought, the two base pairs that were changed from GU to AU (bases 2413 and 2543) are in domain V although not in the PTC as defined below. (BTW 2413 is a purple ring area.)

The 2641-2773 pair in helix 94 is folding correctly in both LinearFold and Vienna2, confirming those two bases are likely not the problem.

As for changing a base pair being safer than changing a single base in a pair, Jieux's 16S hairpin changed two GU to GC.
(Edited)
Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
2.18 design also contains 2413A and 2543A
However, 2.19 contains 2543A and it performed almost as well as WT.
Perhaps 2413 is the culprit?
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
5S and 16S data with mutation count with PEG and DTT


I redid my previous data image with mutation count for the 5S and 16S, since we have got new and better data. 



23S data with mutation count with PEG and DTT




Data with mutation count as per DigitalEmbrace's wish. 


(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
5S, 16S and 23S data with mutation count, but without PEG and DTT

Photo of emmarockit

emmarockit

  • 2 Posts
  • 5 Reply Likes
Hi all, I'm so glad that you all have fun with the new set of data. You might know this already, design 2.24 has a mutation in the active site of the ribosome and I am very confident that this caused the "no activity". 
Photo of DigitalEmbrace

DigitalEmbrace

  • 55 Posts
  • 34 Reply Likes
I take it the active site is the PTC? A couple of us were thinking we should leave that area alone. What about the exit tunnel? Specifically the outer half of the exit tunnel.
Photo of Eli Fisker

Eli Fisker

  • 2244 Posts
  • 499 Reply Likes
I would love if we get an option an ingame highlight of the active site, so we do not alter it by accident, but only if that is what we intend. 
Photo of Eli Fisker

Eli Fisker

  • 2244 Posts
  • 499 Reply Likes
Alternatively, perhaps we could get a ribosome training puzzle with the active site locked, so we learn to recognize it.  
(Edited)
Photo of whbob

whbob

  • 194 Posts
  • 59 Reply Likes
dl's 2.24 has mutations in the 2200 to 2500 sequence locations. That's probably the bad lands area. Gerrys 2.18 has two mutations in the 2100-2200 area. I've read where some stem-elbow-stem structures like Gerrys area are known to be involved in protein production. We know Gerrys mods interfered with production. I'm excited to find out if other mods can enhance production.
  
Photo of DigitalEmbrace

DigitalEmbrace

  • 55 Posts
  • 34 Reply Likes
@emmarockit Can you define the range of bases in the PTC (or what you consider the active site)? So that we know where to avoid making mutations. I'm going to go with 2058-2610 (domain V) for now.
(Edited)
Photo of emmarockit

emmarockit

  • 2 Posts
  • 5 Reply Likes
The bases of the PTC (23S rRNA) are: 2057-2063, 2447-2456, 2496-2507, 2582-2588, 2602, 2606-2611. Two more regions are important: the P-loop with 2250-2254, and the A-loop: 2548-2560.  
I would like to add two thoughts, though, changing the nucleotides in this region can endanger ribosome function independent of the folding, there is some degree of flexibility nucleotides, meaning not all mutations 'kill' translation, and we do not know yet, if compensatory mutations exist.
(Edited)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 989 Posts
  • 310 Reply Likes
I just came across this recent review of ribosome engineering by members of the Jewett lab. It looks like an excellent resource, accessible to the players who are actively participating here.
Photo of Gerry Smith

Gerry Smith

  • 77 Posts
  • 43 Reply Likes
Love their question, "How can we use what's occurred in evolution in context of engineering design". 
Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
I copied the base mutations for each 23S tested design into a doc for quick reference:  
https://docs.google.com/document/d/1eFVOYnWf_V5PPVybWEAC8CZtS_xBIjRJkjk1FtTPprw/edit?usp=sharing
(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
Helix map for 23S


Here is a helix map for 23S in jandersonlee style. (Third sheet)


5S, 16S and 23S Helix Map


Big thx to Gerry for shipping me the numbers for the base pairs, this saved a lot of work. Thanks also for catching a good bunch of errors. 

NB. The areas with red are those where there were discrepancies between Noller's ribosome image (http://rna.ucsc.edu/rnacenter/images/figs/ecoli_23s.jpg) and eterna's dot bracket structure. 



Photo of Gerry Smith

Gerry Smith

  • 77 Posts
  • 43 Reply Likes
Using Eli's Helix Map, I created an draft format table that compares misfolds between designs.  Not sure if this sort of table or some other form is helpful....just thought that adding another way to compare misfolding areas might be helpful.

 https://docs.google.com/spreadsheets/d/11rhtbDZCjL3TBKhEiGqEKDYMgAbOiuEQNS9KYvzlKDI/edit#gid=1169443705
Photo of Eli Fisker

Eli Fisker

  • 2251 Posts
  • 505 Reply Likes
Gerry, I love what you have done here. I will like you to do exactly the same for all the designs for the 23S lab and add the design scores for each design. 

I think comparing misfolds will be super useful. I think we can use this to identify the misfolds that are critical to get fixed and which we better can ignore.

Perhaps afterwards you can add a chart that show the design misfold areas against each other, alongside the scores. 

Basically I think we can use this to identify some of areas in the ribosome that needs our special love. 
(Edited)
Photo of Gerry Smith

Gerry Smith

  • 77 Posts
  • 43 Reply Likes
Here is a hypo for 23S that I am thinking about testing:

Assumptions:
1.  Given Astro's 2.21 high score and many mutations, there are positive and negative mutations in this design.
2.  Given low scores by 2.18, 2.20 and 2.24, these had no positive mutation effects.

Test:
Delete all common mutations from low three designs from Astro's design and see if that improves 2.21's score.  Deletions (common mutations) are:

1119C
1041/1114 Au
2543 A
257 A
1033A
1044 A
1045 C
1110 A
1210 A
1606 A
1694 A
2866 G


Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 989 Posts
  • 310 Reply Likes
It seems to me that "Given low scores by 2.18, 2.20 and 2.24, these had no positive mutation effects." is an overreach. I suspect that there are bases that play such a specific role in assembly or translation that no "good" mutations at other positions could compensate. Nevertheless, the specificity of the hypothesis leads to a very reasonable design to test. 
Photo of Gerry Smith

Gerry Smith

  • 77 Posts
  • 43 Reply Likes
I'd like to try to share/crowdsource our hypotheses in order to design, improve and select the ones we think are best. For this one, I'm really just focused on how to uncover improvements we have made but covered back up by undesirable mutations. Looking for the best ways try do this....
(Edited)
Photo of whbob

whbob

  • 198 Posts
  • 66 Reply Likes
I would like to share a google doc with more information about these lab solutions.https://docs.google.com/document/d/1nX7-ZSzHVDUPbH4oZ3hA2_UAx5UZmwlRbe6mOpLxbuQ/edit?usp=sharing
I've made some comparisons between these sequences.
I think that looking at sequences with widely different performance is a great start. As you suggest, narrowing the areas where performance changes happen is a good path to learn more about what controls protein production.
That said, I believe that there is still a need for pure research covering a broad search for interesting areas of the ribosome.
Photo of whbob

whbob

  • 198 Posts
  • 66 Reply Likes
@Gerry: I have cleaned up my document and added a potential hypothesis.
Photo of Gerry Smith

Gerry Smith

  • 77 Posts
  • 43 Reply Likes
@whbob So if I understand your hypothesis, you have selected mutations within an area within Astro's design, Domain 5, that you think has positive effect and seeing if substituting those Domain mutations for the mutations that were made in designs 2.18, 2.20 and 2.24 will improve them.  Is that correct?
Photo of whbob

whbob

  • 198 Posts
  • 66 Reply Likes
Yes, domain 5 has a lot to do with protein production. By changing one variable (D5) in the 3 poor producing solutions, does production improve? Also, other domains can be taken from Astro's solution and added into the 3 test solutions. 
Several players could participate with different combinations that might give us more information on domain affects on mis-folds etc. 
Photo of DigitalEmbrace

DigitalEmbrace

  • 61 Posts
  • 40 Reply Likes
While Astromon's design performed the best among our designs, it did not improve upon the WT so I don't see the utility of pulling ideas from that design to try and fix the poorer designs. I think we will learn more from completely new designs testing new design approaches. If you would like to test your hypothesis in your designs, I will certainly vote for one but I'm not in favor of devoting several slots. I just think it may be a bit premature to focus so much on one design. How can we learn more from the structural differences among the designs or mutation patterns?
Photo of DigitalEmbrace

DigitalEmbrace

  • 61 Posts
  • 40 Reply Likes
I like your tool for identifying differences between two designs quickly, that could be handy! What does the Hamming distance tell us?
(Edited)
Photo of whbob

whbob

  • 198 Posts
  • 66 Reply Likes
@DigitalEmbrace: I agree. In my first comment to Gerry above:

 "That said, I believe that there is still a need for pure research covering a broad search for interesting areas of the ribosome."

I'm hopeful that we can get 3 rounds of 48 submissions and that voting winners will be very diverse. 
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes

RNA motifs - cheat sheet for the ribosome


I think RNA motifs will be really helpful for further limiting the task of which bases we are most likely to get away with modifying in our ribosome labs. However they may not all be equal.


DigitalEmbrace: Many of our best designs have mutated bases in the motifs, so disrupting these motifs is not necessarily a bad thing. I wonder if perhaps altering certain motifs can sometimes be beneficial, or at least benign. Also, I'm still figuring out which NTs are flexible within each motif style. For example, it appears the Watson-Crick pair can be changed within the A Minor motif without disturbing the structure? 


Andy Watkins: really interesting! yeah, there’s no guarantee that each motif is so critical that on its own mutations would kill ribosome function, or anything like that

Astromon: what is a motif? (A Minor motif) this sounds like a guitar chord to me :)


Andy Watkins: a motif is a recurrent, conserved feature in RNA structure. To be a little too cute about it, it’s anything other than a helix.


Andy Watkins: from the eterna-perspective — you know how one really great way to stabilize a 2/2 bulge is to make it GU/UG? that’s a sort of motif.

in particular the reason people are interested in motifs is because it seems that their properties — in particular their 3D structure — are relatively independent of their sequence context


Andy Watkins: not all motifs have this “modularity” property, but many do


Rhiju has some fine lists with the base positions of where the RNA motifs are in the ribosome.


Rhiju’s 16S RNA motif list

Rhiju’s 23S RNA motif list



Which motifs are safe to change?


DigitalEmbrace: “I wonder whether we need to conserve all these motifs in the ribosome. Perhaps a structure formed by a motif is hindering performance? Perhaps a certain motif in a certain location is the problem? The A Minor motif helps form tertiary connections so I think those are the most important to preserve.”


She volunteered to look through the RNA motif list versus our 23S lab results to see which of the designs that mutated in the motifs compared to how well they did. Plus to see if specific motifs violations were more or less involved in ribosome accident than others.




Pilot 23S: Motif violations


23S thoughts


Notes: Shaded Loop E motifs are covered by the corresponding Bulged-G motif. I only searched for the first base in the A Minor motifs, the “A” component. I only searched for the first component (2NTs) of the UA handle.


23S

2.17 - 0 motif violations - does fair (1M)

2.18 - 0 motif violations - does bad (32M)

2.19 - 1 motif violations - does fair (15M)   (Platform/Bulged G)

2.20 - 14 motif violations - does bad (72M)  (Platform/GA minor, Loop E, GA minor, Platform, Bulged G, U-Turn, GA minor, U-Turn, Bulged-G/Platform, U-Turn, Platform)  *Two bases were changed in U-Turn, GA Minor, U-Turn*

2.21 - 4 motif violations - does fair (43M)  (UA handle, GA minor, Tandem GA)

2.22 - 0 motif violations - does OK (4M)

2.23 - 1 motif violations - does fair (7M)  (A minor non-WC pair)

2.24 - 4 motif violations - does bad (35M)  (A minor, Platform, U-Turn, Platform/Bulged G)


5 that does rather well and in total have 6 motif violations - 1.2 violation per design

Together they have 70 mutations - 14 mutations per design


3 that does bad, have 18 motif violations - 6 violations per design

Together they have 81 mutations - 26 mutations per design 




Pilot 16S: Mutations



16S Thoughts



2.09 - 0 motif violations - does fair (2M)

2.10 - 0 motif violations - does fair (13M)

2.11 - 1 motif violations - does fair (21M) (Z-turn)

2.12 - 2 motif violations - does bad (55) (2 A-minor,)

2.13 - 0 motif violations - does OK (17M) 

2.14 - 2 motif violations - does bad (11M) (Platform, GA-minor)

2.15 - 2 motif violations - does fair (4M) (Same Z-turn motif) 

2.16 - 4 motif violations - does bad (55M) (2 A-minor, Z-turn, Loop E submotif)


5 that does rather well and in total have 3 motif violations - 0.6 per design.

Together they have 57 mutations - 11.4 mutations per design


3 designs that does bad and in total have 6 motif violations - 2 per design

Together they have 121 mutations - 40.33 mutations per designs



Sum up on motif violations


Four designs (2.11, 2.15, 2.19 and 2.21) performed well despite motif violations. 


Three defective designs changed a base in an A Minor (2.12, 2.16, and 2.24), but those were the designs with the highest number of mutations, so we can’t necessarily blame the A Minor.


In 2 out of 3 cases(2.11 and 2.15), the Z-turn violations doesn't seem to harm the design.


A high number of motif violations do seem to turn up in designs that does less well. However for both 23S and 16S it seems to be that having a high number of motif violations that also comes together with a high number of mutations, seems extra bad. 



Afterthought


Perhaps we can get from the conserved bases (IUPAC) in combination with the motif positions, for which motifs are also most conserved and which motifs we will be more likely to get away with modifying/breaking.  



(Edited)
Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
I should clarify on my comment about the A Minor motif. I believe all these motifs involve tertiary connections. What makes A Minors potentially more necessary are their contribution to the binding of tRNA to the 23S subunit.
Photo of Gerry Smith

Gerry Smith

  • 77 Posts
  • 43 Reply Likes
I need a couple clarifications on U Turn motif.  Here is a U Turn from 16S,  NT14-16 with first NT being U, second any NT and third either a G or A purine.

My questions are:
1.  can this sequence come anyplace in the loop?  Like in below example, it does not start at the very beginning of loop.

2.  It must always be "flanked" (white highlights) by either UU, UC, CU, CC, UA, CA, GA or AG.  Correct?

Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 989 Posts
  • 310 Reply Likes
@Gerry In response to your specific questions, the U-Turn motif is defined by a 3D structure, not a specific sequence. I'm not aware of any additional requirements on the flanking bases, but that doesn't mean they don't exist. What lead you to ask about them?

In addition, I just recently learned that since the original papers on the U-Turn motif were written, another motif has been discovered which is almost the same, but, when present, is formed by the sequence GNR instead of UNR. So now there are two recognized sub-types of the U-Turn motif. The ribosome has examples of both. If you look at any of the documents being produced that contain both the base positions of the U-Turn motif and the ribosome sequence, you can see which type of U-Turn it is by seeing whether the first base is a G or a U.
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
RNA Motif Explanations


DigitalEmbrace asked me: Let me know if you find a clear definition of what each motif is.

The motifs in Rhiju's RNA motif lists are called things like A_Minor, GA minor, Bulged G, GNRA Tetraloop, Incalated T-Loop, Loop-E submotif, Platform, P-Loop, Tandem GA Sheared, T-Loop,UA Handle and U-Turn. 

RNA Motif Definitions


I have dug up a bunch of papers with explanations and started a list. I haven't gotten all the explanations in yet, also I will update alongside that I meet better definitions. 
 


(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes

One base moonlighting as two motifs?


DigitalEmbrace took notice that specific bases seemed to belong to more than one motif. 


Switch motif? ;)



DigitalEmbrace: Just so you know, sometimes bases are listed in more than one motif.


DigitalEmbrace: Can a base be involved in two different motifs in the 3D structure?


Andy Watkins: yes, in a couple of different ways.


for example, consider:


U_TURN C:QA:307-309    

UA_HANDLE C:QA:305-306 C:QA:310 C:QA:312


put these two together, and you get


T_LOOP C:QA:305-310 C:QA:312


there is a “compositional” element here. some motif definitions are more granular than others, and “submotifs” exist.


DigitalEmbrace:  Example: A_MINOR C:QA:917 C:RA:79 C:RA:97 and LOOP_E_SUBMOTIF C:QA:915-917 C:QA:860-862 both involve 917.  Three contain 1085: U_TURN C:QA:1083-1085 and A_MINOR C:QA:1085 C:QA:1055 C:QA:1104 and PLATFORM C:QA:1083 C:QA:1085-1086 C:QA:1082.  Two involve 1393: U_TURN C:QA:1391-1393 and A_MINOR C:QA:1393 C:QA:1338 C:QA:1314.


Andy Watkins: very easy for A_MINOR to involve bases something else involves. Two of those bases are BPed to each other and the third is an adenosine making some interactions with the BP (edited) 


DigitalEmbrace: Great, thank you!



I have found a paper with an image that shows the additive nature of some motifs. 





Figure 1 from Probing the structural hierarchy and energy landscape of an RNA T-loop hairpin


It seems that the U handle in this case is a submotif in a bigger motif. Read the text under the image in the paper. 


In another paper, I read that the submotifs could get added to bigger motifs. So motifs are sorts of like legos. Building blocks. 






Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes

I've been thinking about the significant global misfolds LinearFold (and Vienna 2 to an even greater extent) are modeling in the WT ribosome. If such large misfolds were actually happening, the ribosome would not be able to function. This makes me wonder, how close are these helices to each other in the folded ribosome? Can I view them in a 3D representation?

The helices involved are:

25-50/1970-2000 H2-H4/H61,H64

50-65/1930-1945 H5, H6/H71

175-180/1830-1835 Loop between H10, H11/H67

265-270/1765-1770 H14/H64

440/1750 H4/H63

465/1725-1730 H23/H63

560-570/1675-1685 H25/H60,H62

515-530/1710-1720 H2/H63

(Bases are approximate)

It’s pretty much domain I interacting with domain IV. How close is domain IV to domain I? Are they interacting with each other in a way that is impeding ribosome efficiency? Or is the issue in our modeling?

I lean towards thinking the modeling is incomplete and am not nearly as focused on the global misfolds as I was initially. Instead I now wonder if we may find more benefit from addressing local misfolds.

Diagram indicating no base pairing between domain I and domain IV.

If Eli or anyone else who has learned the 3D molecular modeller wants to pull up these helices, I’d be curious to learn the position of these helices in relation to their possible mispair.


(Edited)
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes

Hi DigitalEmbrace! 


Here are some screenshots from Chimera for the misfolds you are requesting. 


I have viewed the following ribosome from PDB in the program Chimera : https://www.rcsb.org/structure/4ybb


For instructions on how to get started, see this post




Misfold image 1



25-50/1970-2000 H2-H4/H61,H64

Commandline: select : 25-50.DA,1970-2000.DA



Misfold image 2




50-65/1930-1945 H5, H6/H71

Commandline: select : 50-65.DA,1930-1945.DA



Misfold image 3




175-180/1830-1835 Loop between H10, H11/H67

Commandline: select : 175-180.DA,1830-1835.DA



Misfold image 4




265-270/1765-1770 H14/H64

Commandline: select : 265-270.DA,1765-1770.DA



Misfold image 5



440/1750 H4/H63

Commandline: select : 440.DA,1750.DA




Misfold image 6



Commandline: select : 465.DA,1725-1730.DA

465/1725-1730 H23/H63


Misfold image 7



Commandline: select : 560-570.DA,1675-1685.DA

560-570/1675-1685 H25/H60,H62


Misfold image 8



Commandline: select : 515-530.DA,1710-1720.DA

515-530/1710-1720 H2/H63


Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
Thank you, Eli! Wow, those helices are all pretty far apart. Seems unlikely they are misfolding with each other.
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
One thing you could do, is to check sequences pairs and see if they are a match. This may be what Vienna2 and LinearFold are reacting to.  
Photo of DigitalEmbrace

DigitalEmbrace

  • 62 Posts
  • 40 Reply Likes
Haven't ever noticed any mispairings in the Eterna secondary structure of the wild-type ribosome.
Photo of Eli Fisker

Eli Fisker

  • 2253 Posts
  • 506 Reply Likes
I took one of your helix sets: 25-50/1970-2000 H2-H4/H61,H64

I marked the bases 25-50/1970-2000. I was in the LinearFold engine. The two base stretches are some way apart from each other. 



I flipped to natural mode and the LinearFold engine predicts the two sets of bases misfolds up with each other. 



Many of the bases in the two stretches of bases are a match to each other. 

In this specific case the same does not happen in Vienna and Vienna2. Nupack is grupmy about giving natural mode. 

If you beam the natural mode to the puzzlemaker, it will give you structure of the WT - with mispairings.
(Edited)