Coder Required: Secondary Structure Classification of RNA Switches

  • 3
  • Idea
  • Updated 2 years ago
This post is a call for a coder in the Eterna community to take part in an initiative to classify the predicted secondary structures of the tens of thousands of RNA switches that have been designed by players for the Chip-Riboswitch labs. The rules for the algorithm are detailed in this document:

https://docs.google.com/document/d/1e...

Some minor notes about the algorithm include that it may be necessary to code multiple iterations of the algorithm until a set of instructions for classifying any structure is found. Also, if players are interested in contributing to the ground rules for the algorithm, that should be possible as well.

Any players interested in this may comment here or send a private message to me (Brourd) in game.
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes

Posted 2 years ago

  • 3
Photo of cynwulf28

cynwulf28

  • 80 Posts
  • 22 Reply Likes
Thank you for this post Brourd,


  I spent some time thinking about this sort of thing while reviewing lab submissions previously and I find it curious how domains are defined for these riboswitches.

  Simply to illustrate everything here as an example, I will use the puzzle: [switch2.5][2 states] Fun with John 0  http://www.eternagame.org/game/puzzle/3883755/   by jnicol.

    I think that most people would immediately jump to the assumption that the domains are specific structural motifs, such as a particular hairpin/bulge/etc., which could be represented in Dot-Bracket notation. That is, (for colored annotation click here: http://prnt.sc/emje9h )

State 1 [Unbound]:

   (((((((((((((((((((((...)).)).)).)).))).)).)).)).)).))

State 2 [Bound]:  

   ((.((.(((((....)))))..............(((((....))))).)).))

 my first instinct is to assume that (((...))) would constitute a specific “domain”.

 

     Perhaps to abstract one level out, we could consider a binary state for the bases: either paired or unpaired.

State 1 [Unbound]:

   111111111111111111111000110110110110111011011011011011

State 2 [Bound]: 

   110110111110000111110000000000000011111000011111011011

 

Or perhaps one might suspect that a “domain” would be defined by a specific Sequence, such as

5’-GGCCCCGCGUGCCACCACGCCAAAGGAUGAUGAGUCGGCUACAGCUGGAGGACC-3’

or using the degenerate codes:

5’-RRYYYYRYRYRYYRYYRYRYYRRRRRRYRRYRRRYYRRYYRYRRYYRRRRRRYY-3’

5’-KKMMMMKMKKKMMMMMMMKMMMMMKKMKKMKKMKKMKKMKMMMKMKKKMKKMMM-3’

 
      However, in its purest form the algorithm presented as an example identifies regions not by a secondary structure, nor ribonucleotide sequence, but rather the algorithm only considers the interactions (by bonding) of parts of Shape/State A to Shape/State B, State/State C, ... Shape/State n.

        If I understand your posted document, then the idea is to define a specific set of interactions (regardless of the finer specifics of structure) that would allow for a straightforward means of generating that subset of sequences which satisfy the “switch”. Using the same puzzle as above I followed steps used for determining domains where classical motifs (e.g., MS2/Trp) are not specifically identified:

1)      Focus on the unbound state first (or perhaps just the first state) and identify domains (5’-3’) in this state.

2)      Internal loops as well as bulges are counted as part of the stack for the purpose of defining domains.

3)      Hairpin Loops, multiloop junctions, and unpaired ends terminate domains

4)      Domains in State 2 are then defined (5’-3’).

5)      These domains for State 2 are then applied to State 1.

 

(To see image intended to be included, click here: http://prnt.sc/emje82 )

   The question I then have is, do the State 2 domains override the State 1 domains (redefine them) and/or must one state be defined in terms of the other exclusively or can we have a composition.

 

   That is, would we have:

 

State 1:

                             [C1,C6], [C1,C5], [C2,C5], [C2,C4], [B1, C4], [B1,B2], [C3,B2], [B2,C3], [B2,B1], [C4,B1], [C4,C2], [C5,C2], [C5,C1], [C6,C1]

 

                       or something like

 

                             [B1(C1:C2:C3), B2(C4:C5:C6)], [B2(C4:C5:C6), B1(C3:C2:C1)]

 

 

For State 2: [C1,C6], [C2,C3], [C3,C2], [C4,C5], [C5,C4], [C6,C1]

 

Would we define state 2’s folding as above (with repetitious associations) or would we purposefully define each domain such that [C2,C3], [C3,C2] would be initially named as e.g., [C2,C2] and there thus being no need to represent this as [C2,C2], [C2,C2]?

 

 

       Let me know if I am understanding all of this correctly. Thank you,

 

                                                                                                -cynwulf28

Photo of cynwulf28

cynwulf28

  • 80 Posts
  • 22 Reply Likes
 Thank you Brourd, this helped to clarify a bit of misunderstanding I had. I do have another question though: how is domain nomenclature determined? Aside from predefined domains (e.g, MS2), I see that x' precedes x" (5'-3'), but for letters used for novel domains, is these letters arbitrarily chosen or is there a convention?
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
It could be arbitrary (such as using the alphabet in an ordered fashion), or it could be related to the name of the signal (M for MS2, T for typtophan, F for FMN, etc.), or it could be based on whether it's a signal or ligand (S for signal, L for ligand).

As for x' preceding x'', that would be a misconception. X' is the domain complementary to domain X, and X'' would be the domain complementary to domain X', and it could be 5' or 3' of the original domain.
(Edited)
Photo of cynwulf28

cynwulf28

  • 80 Posts
  • 22 Reply Likes
Interesting. While I ponder this (and wonder to what extent a knowledge of combinatorics would prove necessary here), I would like to share this link I found:

https://www.tbi.univie.ac.at/RNA/ViennaRNA/doc/html/group__grammar.html 

It is through the RNAfold (Vienna) site. I admit I have only briefly looked at it thus-far.
Photo of cynwulf28

cynwulf28

  • 80 Posts
  • 22 Reply Likes
I recognize that the end-goal for this "grammar" is focused primarily on secondary structure (shape) prediction rather than specifically with patterning interactions more generally, but thought it might be useful :-)  
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
Yes, thanks for the description of your method and notation. I had been meaning to ask you about it since I find your designs are often easy to push into EteRNA stable designs for all three computer engines with often very few modifications if they are not already stable in game.

Also in reading your document, I'm not certain that a particular set of structures (as in dot-bracket notation) will be necessary, rather shape classes abstracted just above the structure level may be sufficient as in RNA shapes level 2 or 3. My quick read of your proposal indicates it  identifies base pairs and helixes that form in each state but doesn't require identifying what happens 'in between' those areas, so many subtle structural details may be undefined and unnecessary.
(Edited)
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
So, to address your last point, the position of the domains can be aligned with a countable set, allowing the measurement of distance between those two domains. Those bases between the two domains can then have their positions aligned with the predicted secondary structure pulled from the game, and you can identify the number of unpaired bases, helices, etc.

As for your first point, I'm confused on what you mean by "I'm not certain that a particular set of structures (as in dot-bracket notation) will be necessary, rather shape classes abstracted just above the structure level may be sufficient as in RNA shapes level 2 or 3." Do you mean the secondary structures aren't necessary for identifying domain interactions? I would counter that the idea here is to identify the interactions of the MFE structure(s) predicted within Eterna. If you wanted to abstract domain interactions for the entire ensemble, that requires a bit more work, and a number of functions that I don't believe are available in Eterna's scripting language. However, if you were able to determine the general base pairing map of the entire ensemble, you should be able to get an average base pairing map for designs.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes


From Single State Labs
Brourd's "T"-M-T'-T-MS2"
AGCCACUCGAGUGAUCCUCACGAGUGGCAACGAGGACCGGUACGGCCGCCACUCGAAUAACAUGAGGAUCACCCAUGUAAAUAAU
State2 .(((((((..(((.....))))))))))..((((...(((.....)))...))))....(((((.((....)))))))....... 85
----------------------------------------------------------------------------------------------
State2 .(((((((..(((.....))))))))))..((((...(((.....)))...))))...((((((.((....))))))))...... 85
AGCCACUCGAGUGAUCCUCACGAGUGGCAACGAGGACCGGUACGGCCGCCACUCGAAUAACAUGAGGAUCACCCAUGUUUAUAAU
#01 Tryptophan A - Same State (Malcolm) nt's (79,80) changed to U.
http://www.eternagame.org/game/solution/7559747/7613625/copyandview/

expanded nomenclature T2"-M-T2'-T1-T2-Ms2
State1 [M]|[Ms2], [T2']|[T2], [T1]|[T2], [Ms2]|[M]
State2 [T2"]|[T2'], [M]|[T2'], [T1]|[T2], [Ms2]
note [M]&[T2'] share a single pair in state2, (the 1st pair of [M]) & hinge of 2bulge for [T2'].

The two structures are not identical, they vary in structure, i.e., the length of the Ms2 hairpin in state 2, and the dangles on each end.
Ms2 maps to two different structures in the respective state2's but maps to only one shape (hairpin of indeterminate length & dangles).
Structures vary within their shape class, not a reciprocal operation.
The method appears to be a shape mapping classification more than a structure mapping one.

Just my opinion. Not sure it is all that helpful. :)
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
In this instance, I guess it would depend on the use of the word structure vs shape. The shape of the RNA doesn't have any inherent information about the individual components of the RNA (by your notation), whereas I'm trying to classify the helices of the secondary structure by their function and relationship to each other. IMO, that would be more a classification of structure, rather than shape.
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
And, it's not necessarily a classification of helices. For example, you can take this same procedure and expand it to something like an OpenTB design, where binding sites may be helices, single stranded RNA, or any combination of the two. It's a rather messy affair though. For example, an oligonucleotide that binds to two separated strands, such as a multiloop. (it gets complicated)

I'd also say that to a certain extent, as the complexity of the switch "increases", that is, the number of second degree and third degree domains increase, that the classification changes as well, but I would guess that the "Shape" would remain the same?
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
From Single State Labs
Brourd's "T"-M-T'-T-MS2"

RNAshapes gives:
AGCCACUCGAGUGAUCCUCACGAGUGGCAACGAGGACCGGUACGGCCGCCACUCGAAUAACAUGAGGAUCACCCAUGUAAAUAAU
-31.60 .(((((((..(((.....))))))))))..((((...(((.....)))...))))....(((((.((....)))))))....... L1 shape is _[_[]]_[_[]_]_[_[]]_ & L3 shape is [[]][[]][[]] & L5 [][][].
.(((((((..(((.....))))))))))..((((...(((.....)))...))))....(((((.((....)))))))....... exact match state2

#01 Tryptophan A - Same State (Malcolm) nt's 79,80 changed to U.
http://www.eternagame.org/game/soluti...

AGCCACUCGAGUGAUCCUCACGAGUGGCAACGAGGACCGGUACGGCCGCCACUCGAAUAACAUGAGGAUCACCCAUGUUUAUAAU
-31.80 .(((((((..(((.....))))))))))..((((...(((.....)))...))))...((((((.((....))))))))...... L1 shape is _[_[]]_[_[]_]_[_[]]_ & L3 shape is [[]][[]][[]] & L5 [][][].

In these switches the structures are different but the shapes L1-L5 are identical. And notice how the L5 & L3 shape descriptors look rather much like your own notation.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
Well, that's odd because shape notation was developed to preserve the essential information of helices in RNA and their positional relationship to each other.

It's a question of which kind of mapping, sequence--->structure, or sequence--->shape, and what level of abstraction you are using and gets you the results intended.

So we disagree about what your method as described and (mis)-understood by me actually does.

I think that coding it as shapes would inherently include the multiple structures that you yourself say can be derived from the method, whereas coding it as structures will likely face difficulties of definition and will not inherently include multiple structures. Either way it still needs coding. :)

I have seen the utility and power of your method and look forward to seeing how it develops.

I believe I can now, at least describe any switch I come across with your nomenclature.

 I see the powers of my persuasion have not improved :). Perhaps some reading on shape theory and use would be better than my efforts?

Thanks for the conversation. It has helped me if not your project :).
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
Yeah, I'm pretty sure at this point it's just a matter of nomenclature and what the meaning of shape vs structure is. To me, this method represents the structural blueprint of the RNA switch, i.e. what you would call it's shape.

Are they the same? Probably.

Am I going to call it something other than structure classification? Probably.

A better phrase to call it would be Riboswitch Structural Blueprint Mapping by Domain Classification, or Riboswitch Topology Classification by Domain Identification, but neither of those have good acronyms so I've not given too much thought into what it should be called.

And thanks! Given the current number of people offering to code it is 0, I suppose it's a ways from being anything more than a document at this point.
(Edited)
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
So, now you have 1st degree, 2nd degree, 3rd degree, nth degree domains...as you say it gets complicated. This I agree with :)

The whole point of shapes analysis is to get rid of minutiae that do not contribute to meaning. Does the method require explicit dot-bracket structure to work? I don't know, but I suspect not. If it does require explicit dot-bracket structure then coding is likely to be more difficult, in my uneducated opinion.

I look forward to learning more about your method in hopes to be able to apply it myself.

Can the method be used for de novo switch development?

In de novo switch development, you start with what a target ? Generate some complimentary sequences? Add a reporter? Add some reporter compliments? Splice together and run an MFE for a structure? Then apply the method? Exactly how do you determine the primary, secondary, etc. domains? You mentioned something about perhaps 3basepairs for a helix or helices interaction as a minimum.

I think a step by step example of generating a de novo switch might help clarify some things. Every tedious detail, since you want to code this.
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
De novo switch development is different from the classification algorithm. You can design sequences that would be a best fit to what the domains state. For example, if domains [N1]|[N1]' base pair, you can generate two sequences with complementary base pairs, and those would fulfill the criteria. If you want to design a secondary structure from scratch just given a couple of domains, inputs, etc. provided in a linear string, the "complexity" of the problem increases quite a bit.

As for explicit dot-bracket structure, it's necessary given our resources, if I understand what you're saying. For example, if I was given a helix GGGAAAGGGAAAACCCAAA, there is no way to identify which GGG pairs with CCC without some prior knowledge of the free energy parameters, which naturally results in the dot-bracket structure being generated along with the MFE structure prediction.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
First a link: Abstract Shapes of RNA: https://www.ncbi.nlm.nih.gov/pubmed/15371549

So, in the example I used above Brourd's "T"-M-T'-T-MS2" vs Malcolm mod, they each solve the switch and both are described by your method the same way. But Malcolm's mod increases the Ms2 hairpin length and decreases the dangle length on each side.

How do you define the Ms2 hairpin in each solve? How does your method account for the changes in Malcolm's mod? If I add a few nt's between each "domain" and increase the total length of the switch we still have the same descriptor for the switch correct? If that is true then the domains have non-domains between them correct? But doesn't each of those modifications change the dot-bracket notation of the switch, yet we can recognize the switch as being essentially the same?

I may be in error but I perceive a subtle inconsistency which is what I'm trying to sort out. So, let's say the method does require dot-bracket structure--is it a 1:1 mapping of sequence to the entire structure? It doesn't seem so. The definitions feel like they change to fit the dot-bracket structure of a given sequence. If that were true it could be a problem.

On definitions: a domain is a complimentary strand(subsequence) which binds to itself or at least one other strand in at least one of the possible switch states?

(The example above has a domain that is an unbound strand in one state and bound in the other)

Is that correct and sufficient?
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
What do these two sequences give for an analysis of their "shape"

UAAUAAUUCCUCAUGUCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA

AAUAAUAACAUGGGUGCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA

Their domains would suggest that the M' domain base pairs with two different domains, however, does the shape provide the same information, or does it provide other information about the sequences that are complementary?

So, I'm not sure if you read the document in full. For a short summary, domains are defined by the signal generating region and the ligand binding region. The domains interact with other sequences in their respective OFF states (These would be the N-prime, or N' domains). Then, these domains interact with other sequences in their respective alternative states, and these second degree domains could be signal or ligand regions, N' domains, or second degree (N-double prime, or N''). This cycle is continued until all switching bases have been defined with respect to the signal and iigand binding regions.

So, to summarize the summary, the RNA base pairing structure is classified with respect to the two fixed aspects of the RNA switches we design.

This requires the secondary structure AND sequence. The signal and ligand binding domains are specific sequences, and these are a part of specific secondary structures (usually). And the way these structures switch is how you ascertain the domains complementary to your origin domains.

You can define the length, structure, sequences, etc. of the regions between or after domains, given their secondary structure, and given you know the location of all domains within the defined sequence.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
Since you didn't specify both states of the switch  but only their sequences I cannot follow your comments well. Although it looks like you are talking about a potential Ms2 switch.

For me to comment intelligently I would need to know which model (Vienna, Vienna2, NuPack) you are referencing and the dot-bracket structures of each state of the switch.

So, here is a fairly complete output of how RNAshapes sees these two sequences in folding mode, arranged by me so you can view them at each level of shape abstraction along with the dot-bracket and mfe. I did not include probability and other analysis.

RNAshapes
Shape folding
Energy range 10% mfe

UAAUAAUUCCUCAUGUCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA
-32.50  ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [[[]][[]]]-L3,  [[][]]-L5


AAUAAUAACAUGGGUGCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA
-32.50  ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [[[]][[]]]-L3,  [[][]]-L5

1-shape Level 1
        UAAUAAUUCCUCAUGUCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA
-32.50  ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  _[_[_[]_]_[_[]]_]_
-31.31  .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))....................  _[_[_[_[]]]_]_
-29.50  ..(((........)))..((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  _[]_[_[_[]_]_[_[]]_]_
-29.41  .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))........((....))....  _[_[_[_[]]]_]_[]_
-29.30  ..................((((((((....((((...(((.....)))...)))).(((((.........))))).)))))))).  _[_[_[]_]_[]_]_

2-shape Level 1
        AAUAAUAACAUGGGUGCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA
-32.50  ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  _[_[_[]_]_[_[]]_]_
-31.00  .......((....))...((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  _[]_[_[_[]_]_[_[]]_]_
-29.91  ..........(((((((.((((((((.......((.(((...))))))))))))).(((((.((....))))))).)).))))).  _[[_[_[_[]]]_[_[]]_]_]_
-29.81  ..........((((((..((((((((.......((.(((...))))))))))))).(((((.((....)))))))...)))))).  _[_[_[_[]]]_[_[]]_]_
-29.70  ..........((....))((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  _[][_[_[]_]_[_[]]_]_
-29.33  .......(((((((((..((((((((.......((.(((...)))))))))))))...........)))))))))..........  _[_[_[_[]]]_]_
-29.30  ..................((((((((....((((...(((.....)))...)))).(((((.........))))).)))))))).  _[_[_[]_]_[]_]_

1-shape Level 2
        UAAUAAUUCCUCAUGUCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA
-32.50  ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [[_[]_][_[]]]
-31.31  .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))....................  [_[_[_[]]]_]
-29.50  ..(((........)))..((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [][[_[]_][_[]]]
-29.41  .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))........((....))....  [_[_[_[]]]_][]
-29.30  ..................((((((((....((((...(((.....)))...)))).(((((.........))))).)))))))).  [[_[]_][]]

2-shape Level 2
        AAUAAUAACAUGGGUGCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA
-32.50  ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [[_[]_][_[]]]
-31.00  .......((....))...((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [][[_[]_][_[]]]
-29.91  ..........(((((((.((((((((.......((.(((...))))))))))))).(((((.((....))))))).)).))))).  [[[_[_[]]][_[]]]_]
-29.81  ..........((((((..((((((((.......((.(((...))))))))))))).(((((.((....)))))))...)))))).  [[_[_[]]][_[]]]
-29.33  .......(((((((((..((((((((.......((.(((...)))))))))))))...........)))))))))..........  [_[_[_[]]]_]
-29.30  ..................((((((((....((((...(((.....)))...)))).(((((.........))))).)))))))).  [[_[]_][]]

1-shape Level 3
        UAAUAAUUCCUCAUGUCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA
-32.50  ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [[[]][[]]]
-31.31  .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))....................  [[[[]]]]
-29.50  ..(((........)))..((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [][[[]][[]]]
-29.41  .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))........((....))....  [[[[]]]][]
-29.30  ..................((((((((....((((...(((.....)))...)))).(((((.........))))).)))))))).  [[[]][]]

2-shape Level 3
        AAUAAUAACAUGGGUGCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA
-32.50  ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [[[]][[]]]
-31.00  .......((....))...((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [][[[]][[]]]
-29.91  ..........(((((((.((((((((.......((.(((...))))))))))))).(((((.((....))))))).)).))))).  [[[[[]]][[]]]]
-29.81  ..........((((((..((((((((.......((.(((...))))))))))))).(((((.((....)))))))...)))))).  [[[[]]][[]]]
-29.33  .......(((((((((..((((((((.......((.(((...)))))))))))))...........)))))))))..........  [[[[]]]]
-29.30  ..................((((((((....((((...(((.....)))...)))).(((((.........))))).)))))))).  [[[]][]]

1-shape Level 4
        UAAUAAUUCCUCAUGUCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA
-32.50  ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [[[]][]]
-31.31  .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))....................  [[]]
-29.50  ..(((........)))..((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [][[[]][]]
-29.41  .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))........((....))....  [[]][]

2-shape Level 4
        AAUAAUAACAUGGGUGCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA
-32.50  ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [[[]][]]
-31.00  .......((....))...((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [][[[]][]]
-29.91  ..........(((((((.((((((((.......((.(((...))))))))))))).(((((.((....))))))).)).))))).  [[][]]
-29.33  .......(((((((((..((((((((.......((.(((...)))))))))))))...........)))))))))..........  [[]]

1-shape Level 5
        UAAUAAUUCCUCAUGUCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA
-32.50  ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [[][]]
-31.31  .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))....................  []
-29.50  ..(((........)))..((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [][[][]]
-29.41  .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))........((....))....  [][]

2-shape Level 5
        AAUAAUAACAUGGGUGCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA
-32.50  ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [[][]]
-31.00  .......((....))...((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  [][[][]]
-29.33  .......(((((((((..((((((((.......((.(((...)))))))))))))...........)))))))))..........  []
(Edited)
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
Here is a link to the two sequences, which used the Vienna2 parameters for determining the MFE structure.

http://www.eternagame.org/game/browse...

http://www.eternagame.org/game/browse...

As I stated further down, the document describing the algorithm makes a distinction for the MS2 domain, and all other signal and ligand binding domains. In essence, the MS2 domain has two sub-domains that describe which strand is interacting with it's respective second degree domain. In the case for these two sequences, the M' domain is complementary to M1 in one design, and complementary to M2 in another design. However, based on how you described what a "shape" is, these two distinctions would not exist, considering they are both in similar structural classes, with only different locations for helix formation. *correction Vienna2 Parameters*
(Edited)
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
In my opinion if you can put your thoughts into pseudo-code it will help anyone who actually codes the algorithm for you. Minutely describe each step of the alorithm, preferably one step, one action per line.

As a poor partial example:

get switch state
get dot-bracket structure
parse structure into elements (ViennaRNA defs)
sort elements by class, bound/unbound
get sequence length of each element
build array {state, element, class, length}
repeat until all states done

get bound elements state1
get element length
no lonely pairs
for each bound element, split length front_L|back_L
build array {state,element, front, back, front_L, back_L}
get sequence for each {state, element, front, back}
assign sequences to {state, element, front, back}
repeat until all states done
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
Sorry. Don't know any pseudo code :(
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
About those two sequences...here is how I would make the first a putative Ms2 switch (trying to emulate your algorithm as I think you would present it):

Domains:
Ms2' = (8-16)
A      = (19-26)
B      = (31-40)
C      = (43-55)
Ms2  = (57-75)
A'      = (77-84)

switch seq [Ms2']-[A]-[B]-[C]-[Ms2]-[A']

state1    [A|A'], [B|C], [Ms2], [A'|A]
state2   [Ms2'|Ms2], [A|C], [B|C], [C|B], [C|A], [Ms2|Ms2']

seq1
UAAUAAUUCCUCAUGUCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA

seq2
AAUAAUAACAUGGGUGCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA

                          *Ms2'*                                            *****Ms2*****
likely State1 ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).

                            *Ms2'*                                            *****Ms2*****
poss State2 .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))....................

Did I come close?
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
The original system was described in that manner, however, it was determined that not enough information about the switch was described in that notation. The document describes the updated notation. where domains like the MS2 hairpin are described using two separate domains, with one half of the helix being one domain, and the other half being another domain.
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes


Here is the domain classification for one of the sequences I provided. in this instance:

(M1)'-(W2)'-(W1)-(W2)-(M1)-(M2)-(W2)''

State1

[M1']|[M1], [W2']|[W2], [W1]|W2], [W2]|[W1], [W2]|[W2'], [M1]|[M1']

State2

[W2']|[W2''], [W1]|[W2], [M1]|[M2], [W2'']|[W2']
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
Excellent! As i had originally broken Ms2 into the two domains but for this discussion used the consolidated notation.

Do you agree with the loci of my domains? Or would you place them differently?

Is the sequence from a lab? And did I pick correct dot-bracket structures? (I suspect the 2nd is less likely if this is from a lab.) I merely chose the mfe structures that RNAshapes said both sequeces had at -32.5kcal and nearby, and Vienna, Vienna2, NuPack could see in Puzzlemaker for one or the other sequence. Be interesting to see how that compares to any lab if there is one.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
It appears I picked the exact structures you used in the labs.

puzzle1 UAAUAAUUCCUCAUGUCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA


                  state1 .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))....................
my poss state2 .......(((((((((..((((((((.......((.(((...))))))))))))).))))))))).................... exact match and L1 shape  _[_[_[_[]]]_]_

Switching seen by scrolling from Vienna to Vienn2.
                                                                  
                   state2 ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).
my likely state1 ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))). exact match and L1 shape  _[_[_[]_]_[_[]]_]_

So, I found your puzzle and remade your switch de novo to me starting only from a sequence.
                                                                   

puzzle 2 AAUAAUAACAUGGGUGCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA

Switching seen by scrolling from Vienna to Vienna2. 
       
                              state1 -29.33  .......(((((((((..((((((((.......((.(((...)))))))))))))...........)))))))))..........  _[_[_[_[]]]_]_
from 2-shape Level 1 -29.33  .......(((((((((..((((((((.......((.(((...)))))))))))))...........)))))))))..........  _[_[_[_[]]]_]_ exact match and shape

So, I also found this one, de novo to me starting only with a sequence. And I saw and see the argument you are making but only if I scroll through all three engines because they don't each solve in one engine alone. Switching can be seen from Vienna to Vienna2 as common to both these puzzles.

So we are actually discussing at least 3 structures and 3 shapes. And I found the exact match for each.

If these switches work as the computers say then they do drape MS2 differently from one conformation to another.

Puzzle1  uses loci M1'(8-16) and puzzle 2 see it as M2'(8-16) with bindings [M1'|M1] for puzzle1, and binding [M2'|M2] for puzzle 2.

That has nothing to do with shape notation, they do that with dot-bracket notation too!

What shape notation does tell you is that

                     my poss state2 .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))....................  _[_[_[_[]]]_]_  this dot-bracket matches next below
                     puzzle1  state1  .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))....................  _[_[_[_[]]]_]_
                     puzzle2  state1  .......(((((((((..((((((((.......((.(((...)))))))))))))...........)))))))))..........  _[_[_[_[]]]_]_  this dot-bracket matches next below
from 2-shape Level 1 -29.33  .......(((((((((..((((((((.......((.(((...)))))))))))))...........)))))))))..........  _[_[_[_[]]]_]_   and All exactly match in shape

but

                       puzzle1 state2 ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  _[_[_[]_]_[_[]]_]_
                     my likely state1 ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).  _[_[_[]_]_[_[]]_]_                   these two have a different shape from those above .

Common substructures are _[_[_[  (at each head) and ]]_]_ (at each tail) so _[] of first set and ]_]_[_[ of second set may be of interest.
What are they in dot-bracket?

Let's see:
_[] is, I think, .(((...))), loci(35-45) in first set (the little hairpin) and ]_]_[_[ is )))...)))).(((((.((, loci(46-64) of second set. This latter contains 2 domains M1 after something.

So, shape abstraction let me find 3 of the domains at least approximately by subtraction of common elements.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
Another thing  shape level abstraction says about these two sequences and switches is:

you can go from this shape  _[_[_[]_]_[_[]]_]_ to this shape _[_[_[_[]]]_]_ and it finds that this dot-bracket
..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))). can switch to these two
.......(((((((((..((((((((.......((.(((...))))))))))))).))))))))).................... and
.......(((((((((..((((((((.......((.(((...)))))))))))))...........)))))))))..........
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
revisiting, editing and expanding...

Common substructures are _[_[_[ (at each head) and ]]_]_ (at each tail) so _[] of first set and ]_]_[_[ of second set may be of interest.
What are they in dot-bracket?

Let's see:
_[] is, I think, .(((...))), loci(35-45) in first set (the little hairpin) B2+B2', and ]_]_[_[ is )))...)))).(((((.((, loci(46-64) of second set. This latter contains 3 domains, B2'+B1'+ M1.

So, shape abstraction let me find many of the domains at least approximately by subtraction of common elements.

Back to the common elements, what do they say?

_[_[_[ in the first set is loci(1-35), M'+A+B1, and in the second set is loci(1-40), M'+A+B1+B2(-).

]]_]_ in the first set it is loci(47-85), something+ B2'+B1'+ M1+ M2+ A'. In the second set it is loci(65-85), 2 domains, M2 + A'.

Let's try dot-bracket commons and subtractions:

puzzle1 UAAUAAUUCCUCAUGUCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA
M'-T'-T-MS2-T"-Design4 (Brourd) http://www.eternagame.org/game/browse...

state1 .......(((((((((..((((((((.......((.(((...))))))))))))).)))))))))....................
state2 ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).
common ......_______(((((((_...__ .(((..._)))_ ))))___________._____. pairing loci(21-26)A1+A2, (38-40)B2, (46-48)B2',(52-55)B1'.

puzzle 2 AAUAAUAACAUGGGUGCACGAGUGGCAUAACGAGGACCGGUACGGCCGCCACUCGAACAUGAGGAUCACCCAUGUAGCCACUCGA
M'-T'-T-MS2-T"-Design4 (Brourd) http://www.eternagame.org/game/browse...

state1 .......(((((((((..((((((((.......((.(((...)))))))))))))...........)))))))))..........
state2 ..................((((((((....((((...(((.....)))...)))).(((((.((....))))))).)))))))).
common ......______ (((((((_...___.(((..._)))_ ))))___._..._)))))))._____. pairing loci(21-26), (38-40), (46-48),(52-55), (69-75)M2.

So dot-bracket gives clean A, B' and partial B2, B2' domains in both switches with clean M2 in second switch only.

Combining both shapes and dot-bracket information I can derive loci(1-20) M', (27-35)B1, (76-85)A'.

So combined I find M'(1-20), A(21-26), B(27-45), B'(46-55), M1(56,64), M2(65,75), A'(76-85).

That looks darned close to your domains. :) But I needed two switches of very similar natures to get there. Bummer :(

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Now instead apply my proposed definition of a domain: a domain is a complimentary strand(subsequence) which binds to itself or
at least one other strand in at least one of the possible switch states. (any parenthesis in dot-bracket).

What do we get on these puzzles? First you only need one puzzle not two to get somewhere. Bonus you are strictly dot-bracket.

Puzzle1 state 1 loci(8-16)*, (18-25)*, (33-34)**, (36-37)**, (41-53)*
state2 loci(19-26)*, (31-34)*, (38-40)*, (46-48)*, (52-55)*, (57-61)*, (63-64)*, (69-75)*, (77-84)*

My definition gets them all no fuss no muss, in one easy step. The only structure I need to see is parenthesis of either type.

Show me switches where my definition cannot work, or else why make it difficult?

You might want to let assymetric 1 bulges be natural dividers of domains, but you might object to M1,M2, M3 resulting. Maybe
a minimum length of 5 basepairs?

Better yet just make any interior loop or bulge a natural divider. Maybe the same for any hinge )(. Maybe midpoint of any endloop is divider.

These are things that should be easy to code and wouln't drastically affect the end results of your method on these very few puzzles we have
examined here, in my opinion.
Photo of Brourd

Brourd

  • 435 Posts
  • 79 Reply Likes
Hmmm, let's reel everything back for a second. By definition (for this algorithm), domains refer to functional components of the riboswitch. In this instance, you have a ligand binding domain that tryptophan associates with in solution. You also have a signal domain, which in this instance refers to the MS2 binding hairpin sequence.

The reason why this is the case, is that both of these sequences are associated with a fixed structure for their binding site, which players are required to design around. i.e. I can't change the structure of the MS2 binding hairpin, and I can only make some moderate changes to the sequence. Deviating from this will most likely cause the riboswitch to no longer be a riboswitch, given it will no longer be able to associate with the ligands in question.

So, going back to what your suggestion would be, define the domains for this sequence

(Model: Vienna2)

http://www.eternagame.org/game/browse...

Based on your suggestion, the complexity of the switch increases significantly, with what, a minimum of 6 different domains, each having several domains it interacts with? Using the algorithm suggested in the document, there are two major domains, 4 minor domains, 2 second degree domains, and then one significant tertiary domain, with 3 minor tertiary domains that consist of single base pairs?

The idea behind this algorithm is to determine the folding topology of the functional parts of the RNA with respect to the rest of the sequence. From this folding topology, you can derive the structure (or shape) of the "switch." Then, from the rest of the secondary structure, you can determine the distance between the functional parts and the sequences they interact with, as well as any non-interacting regions of the RNA.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
To quote cynwulf28 from earlier, " However, in its purest form the algorithm presented as an example identifies regions not by a secondary structure, nor ribonucleotide sequence, but rather the algorithm only considers the interactions (by bonding) of
parts of Shape/State A to Shape/State B, State/State C, ... Shape/State n."

Which I agree with.

I will restate it thus: The algorithm declares the relationship across each parenthesis and both switch states, highlighting where where one side of a parenthesis in dot-bracket notation in one switch state lands in the other switch state. (Parenthesis being the only place bonding takes place by definition in dot-bracket notation).

As long as we are fishing for a solution we shall have to cast and reel it back many times I imagine. :)
 
So, I have read the document several times, and I read your statement above, but I confess I do not follow. I hear your words but I do not understand some subtlety of what your position is. It seems I have some grasp of the idea since I have provided
examples and responded to your examples with no major complaint. My apologies for my dim-wittedness, let us proceed. :)

Yes, my suggestions will technically result in lots more 'domains', but in reality the bonds go where the bonds go and dot-bracket notation shows exatly where they go. Staying 100% true to it, as I am suggesting here, means you don't have to predefine every unknown motif that may come along. It makes the description more unweildly, I'll give you that, but it also makes it calculable and codable. And easy to explain even to one like myself. :)

To your challenge:
Same State NG2
Erlarera (ViennaUCT)#PessimisticBindings #-2.5kcalFMN #-0.7kcalMS2 score 100%
http://www.eternagame.org/game/browse/6369196/?filter1_arg2=6428966&filter1=Id&filter1_arg1=...
UGCGGAUAUCGAGGAUAUGCGACGGGUUCUCACAUGAGGAUCACCCAUGUCCGCGGGCCUCAAAAGCAGAAGGCGAGUUCGGUU
[1] -24.40..(((((.(((.(((((((.(...(((((((....)))))))..)))))))).)))(((((.......).))))..)))))... design
[1]           .((((((((((........))).((((((((....))))...)))).)))))))..((((..........)))).......... 84 native in game [1]
[2] -23.90.((.((..(((......(((...((((((.((((((.((....)))))))..).)))))).....))).....)))..)).)). design

note this does not switch according to all three models in game, so in game I only have state [1].

[1] loci(3-7), (9-11), (12-18), (20), (24-30), (35-41), (44-51), (53-55), (56-60), (67), (69-72), (75-79) from posted dot-bracket
native in game [1] .((((((((((........))).((((((((....))))...)))).)))))))..((((..........)))).......... 84
[1] loci(2-8), (9-11), (20-22), (24-27), (28-31), (36-39), (43-46), (48-54), (57-60), (71-74) from native puzzle state in game.

I would say it did not assume the designed state[1] but some alternate, likely the native state displayed in game. I would guess it did not assume design state[2] but again some unknown alternate as it does not switch in game. I would probably need RNA shapes and Puzzlemaker to redefine what I think happened with the sequence.

If I halt here, I would give the loci from the in game native state[1] as the domains. This gives 10 domains overall. But I think it better to ask ourselves if it really did switch into design state[2] or some other alternative state[2] and if an alternative what was it?
Since it got a score of 100% it must have switched into something.

Puzzlemaker gives a NuPack match exactly for the
native in game state[1] .((((((((((........))).((((((((....))))...)))).)))))))..((((..........))))..........   84
It gives two very similar Vienna & Vienna2 native states
            Vienna     state[2] ..(((((..((.(((((((.(...(((((((....)))))))..)))))))).)).(((((.......).))))..)))))... 84
            Vienna2  state[2] ..(((((.(((.(((((((.(...(((((((....)))))))..)))))))).)))(((((.......).))))..)))))... 84
               design  state[2] .((.((..(((......(((...((((((.((((((.((....)))))))..).)))))).....))).....)))..)).)).   84

So I would guess this lab solution switches between the in game Nupack state[1] and one of these Vienna state[2]'s.



Now I can complete the domain classification:

NUPack state[1] loci(2-8), (9-11), (20-22), (24-27), (28-31), (36-39), (43-46), (48-54), (57-60), (71-74) from native puzzle state in game. NuPack state[1], 10 domains.

Vienna2 state[2] loci(3-7), (9-11), (13-19), (21), (25-31), (36-42), (45-52), (54-56),(57-61), (69), (71-74), (77-81) puzzlemaker [2].  Vienna2 state[2], 12 domains.

So I say it solved this way
 NuPack  state[1]  -23.9kcal .((((((((((........))).((((((((....))))...)))).)))))))..((((..........))))..........   84 native in game [1]
Vienna2  state[2]  -25.3kcal ..(((((.(((.(((((((.(...(((((((....)))))))..)))))))).)))(((((.......).))))..)))))... 84 puzzlemaker [2]

Can we consolidate the domains to look like what you use? I don't know.
(Edited)
Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
The puzzle is the "same-state" type, and the scoring of the on chip riboswitches is based on a measurement of fluorescence from the binding of the MS2 protein to its respective hairpin. And the binding of a ligand is the mechanism that guides the formation of the MS2 hairpin.

If the RNA ensemble shifted in equilibrium as you suggested with the above screenshot, the MS2 protein wouldn't have been able to associate with the nucleic acid in any meaningful way, and the measurements of fluorescence would be inherently incorrect, given the mechanism for switching is no longer there.



These are the predicted secondary structures that Eterna predicts for that sequence with the Vienna2 parameters.

So, to address your statement about the method not being sequence or structure dependent, that would be partially incorrect. The domains are dependent on the functional segments of the RNA molecule, which have specific sequences and structures associated with them. One issue with your method of naming domains is that you can't gain any information about the functional parts of the RNA.

For example, there will be a domain complementary to the MS2 hairpin that turns it off. What is the free energy of the helix that this complementary sequence forms with the MS2 hairpin? If the functional parts aren't defined, then information like this can't be extracted from the interactions between domains.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
When I look at it in game it doesn't solve in Vienna2 or in puzzlemaker that way.
So I don't know where you get the images from.

Until we agree on a link that gives us both the same information were stuck on this one. I'll check my end again.

Ah! I found the problem. Ok , I'll Have to start over then. :) But I found a way to consolidate my mini domains that looks remarkably like yours even with the error, so we'll see.

Let me delete the erroneous entries from this log.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
So I would visually draw the domains like this & check against calculated domains too.

Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
Ok, sorry for the error:

Here it is.
Same State NG2 lab
Domain correspondences Brourd -v- CAC
Erlarera (ViennaUCT) 100% score

[1] -24.40..(((((.(((.(((((((.(...(((((((....)))))))..)))))))).)))(((((.......).))))..)))))...
[2] -23.90.((.((..(((......(((...((((((.((((((.((....)))))))..).)))))).....))).....)))..)).)).

Ok got it in puzzlemaker now identical to you.

[1] loci(3-7), (9-11), (12-18), (20), (25-31), (36-42), (45-52), (54-56), (57-61), (69), (71-74), (75-76), (77-81)
[2] loci(2-3), (5-6), (9-11), (19-21), (24-29), (31-36), (38-39), (44-50), (53), (55-60),(66-68), (74-76), (79-80), (82-83)

+ rules begin with state[2] if unclaimed nt adjacent to boundpair of another section add to it.
Gives +1, +3, +8,
state[2]
Gives (+3)+4, +7, +8, +12,
if part of a loop is claimed divide remainder, add to smaller segment
state[1] loops gives +2+2, +7, +12
state[2] loops gives ++12

updated extended loci(2-7), (8-12), (13-22), (23-33), (34-42), (43-52), (53-56), (57-61), (66-70), (71-74), (75-76), (77-81), (82-83)
 
Your loci F1(2-20), M1'(25-31), M1(32-42), M2(43-50), F1'(51-56), F2'(57-65), F2(66-83)
correspondences (F1:1,2,3), (M1':5), (M1:6), (M2:7), (F1':8), (F2':9), (F2:10,11,13,14)

I find every thing almost same as you with some chunking differences. Works for me and can be coded, imo.


(Edited)
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
On Shape abstractions, the utility comes from the idea that the unpaired parts can be any length and the stacks can be any length and the shape remains the same. At least for this application. Imagine adding or subtracting basepairs and increasing or decreasing loop sizes in the diagram below. See how it would work with your algorithm? I think it's what you are doing but don't realize it.

Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
It's not that I don't agree that the abstraction of higher order structural characteristics isn't useful, or have any utility, however, of what use would it be to define the "shape" of any of the RNA switches we design?

The method of RNA folding topology that is described in the document has several utilities:

1. The information on how the RNA switches is linked to the domains and the order in which they're presented. For example, The M1' domain is always going to be complementary to the first strand of the Ms2 domain, and describes both the position and length of the structure complementary to the hairpin. That is, if a player is reading the string of domains, they'll know that (M1')-N-(M1) describes without fault that the sequence that complements the Ms2 domain and its relative position.

2. The free enthalpy of these helices can be ascertained by the summation of the free enthalpy parameters that are provided within the folding models. That is, the summation of all the free enthalpy numbers for the base pairs and loops within the [M1']|[M1] (as an example) interaction can be derived from this information. In addition, information about the base pair identities and the number of each base pair is something that can be determined.

3. Design utility: While this exists for the shape abstractions and domain system you suggested, it doesn't seem like it would be possible to determine the position of the ligand and signal binding domains with your methods. That is, you'll have the "shape" and a bunch of numbered domains, but neither gives any indication of where the Ms2 hairpin would be located for a sequence that requires the MS2 hairpin as a signal, or any information on ligands that could associate with the RNA and be the mechanism for switches.
(Edited)
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
Did you look at the shapes diagram and the Level1 representation? It is extremely similar to the dot-bracket notation, the difference being it allows for the extension of stems and loops that give the idea of 'shape' some flexibility that a rigid reliance on dot-bracket notation lacks. Further up the levels of abstraction you get the flexibility to view 'shape' as crossing boundaries like 1,1 bulges or internal loops. You don't currently see it's potential and prefer strict dot-bracket structures. OK.

Domain class ID-2-eternacac
loops as 1st pass filter
loop>2 separates domain

state [1] domains (2-20), (25-42), (44-60), (69-83)
state [2] domains (2-11), (18-20), (24-60), (66-68), (74-83)
//looks good so far//

2nd pass filter

find hairpins, stems_w/terminal_loop,
state[1] (25-42), (57-74)
state[2] (32-50)
//still ok//

3rd pass filter

locate haipin in domain cluster & splits
state[1] (25-42)*--> (25-42);
(44-60)<(57-74)*<(69-83) --> (44-56), (57-60), (61-68), (69-74), (75-83)
state[2]  24<(32-50)*<60 --> (24-31), (51-60)

4th pass filter
splits from both states
(2-11)<(2-20)--->(12-20),
(24-31)<(44-56)--->(32-43),
(32-43)<(32-50)-->(44-50),
(44-56)<(51-60)-->(44-56),(51-56),(57-60)

//that's everything//

domains (2-11:12-20), (24-31), (32-43), (32-50),(44-50), (51-56), (57- 60), (61-68), (69-74), (75-83)

this works pretty well for this puzzle :) gives you your big sections much as you might find them. Is this more acceptable?

Photo of Brourd

Brourd

  • 437 Posts
  • 79 Reply Likes
I believe the disconnect right now is in that you're trying to convince me that the "shape" notation is useful. Again, I stated that it has utility, but not for the application at hand. To use what you wrote as an example:

Your domains in order would be (1)-(2)-(M1)-(M2)-(3)-(4)-(5)-(6)-(7)

If I were to give you this string of characters, would you be able to design an RNA switch? The answer to that, I imagine, would be complicated.

Does this string describe a nucleic acid that changes its structure in response to the MS2 protein, or does it contain other functional parts that also change structure in response to other ligands?

What is the order in which these domains interact with each other? Does domain 1 base pair with domain 2, the Ms2 hairpin, or domain 5?

The reason why the notation I'm using exists is to be able to do these things. If a nucleic acid is described by a random domain string like (L1'')-(S1')-(L1)'-(S1)-(S2)-(L1)-(L2) (where L represents a ligand binding domain, and S represents a signal generating domain), That information is inherent to the string. For example, (L1') is *going* to be complementary to (L1), and (L1'') is going to be complementary to (L1'). These are inherent to the notation itself. In addition, while you may not necessarily be able to design an RNA switch with just this character string, you now have enough information to be able to provide the position of both functional parts of the domain and to which domain they are initially complementary.

This is why I don't see much use in using numbers to describe domains. It doesn't seem to provide any information besides the inherent number of domains, in my opinion, but perhaps you can describe how this result would be useful to either a player or possibly another algorithm reading this sequence string.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
So your difficulty is with using numbers for labels instead of F1, F2, M1, M2? Not with the domains I found? I thought you were looking to code an algorithm to express your method---wanted a way to find the 'functional' parts of the sequence so that you could automate the labeling and state pairing groups.

Any way, did my recently submitted process instructions and picture identify the 'functional' segments placement in each state in the picture well enough for you? You can then label the identified segments however makes the most sense for you.

And I agree an output that says F1, F2, M1, M2 is more comprehensible to humans and conveys the sense much better. That detail can be dealt with. First an automated method for identifying 'functional groups' that mimics what you draw on your diagrams and has a reasonable logic must be found. That is what I'm working towards. Labels 1,2,3 or F1, F2, M1, M2 have no use until the logic of how you identify the structures works to your satisfaction or can't be distinguished by another from your own work. I don't recall you saying how you identify the structures and their co-mingling  into 'functional groups'---that is what I thought we were working on.

I think my last submission is close for that particular puzzle and I may have a minor improvement.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
So you understand,

I get the computer or person is given a  dot-bracket representation of the entire switch sequence. And the labels and structures for targets like tryptophan are provided to the human or computer as well as for the signal like Ms2. And I understand the method says split these to better describe the switch. And these labels will be part of final output to a string with other symbols describing the complementary relations to target and signal first, then to secondary relations until all the active components of the switch have been documented.

The computer needs a way to implement this without it already being completely done for it. Isn't that the help you ask for? So the computer will always get target and signal right because it is predefined for it. The computer will search and match these elements and return their positions in the string if they are not specified. The computer will identify each complimentary pairing in the entire structure based on the inherent nature of dot-bracket notation.

So I think what you want is a string matching & substitution output. You want a report.
And you want it to be in the format in your document. That's why the numbers were bothering you. Oh well, these misunderstandings happen.

This is all about list and string manipulation. Completely different than what I thought you were after. No wonder the communications problem.

Might be able to cheat & use puzzlemaker to do it, maybe. Hmmm...


                    
                  
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
Well I got my first attempt at hand stepping through some pseudocode to give me a correct result for a truncated puzzle with this output [F1]-[M1']-[M1]-[M2]-[F1']-[F2']-[F2]. Hooray, means a real coder should be able to do this. I did not yet work on the state1 & 2 bonding maps. And somehow managed to lose the dang pseudocode!!!

But now I understand what you want and have hand stepped it as pseudocode I'm sure it can be done by someone fairly easily. Maybe not me, but someone.

I'll keep messing with it and see if I can actually pull it off, but a real coder would already be done, I believe.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
For an actual script things I'll likely need:

Snipets to assemble

//maybe use this to find  parenthesis open then closed
    if(structure[i] == "(") {
//do something
        count += 1;
    }
    if(structure[i] == ")") {
//do something
        count -= 1;
    }
// end section

//or do parenthesis this way

// finds open parenthesis
for(var i = 0; i < structure.length; i++) {

  if(structure[i] != "(") {
    continue;
  }
// finds opposite side of parenthesis 
  other_pos = find_pair(i,structure);
//compare it here?
}

some Flash API's needed

Getters  & Setters
get_sequence_string()
Parameters: none
Returns:
a string representing the sequence

set_sequence_string(seq)
sets the sequence in the puzzle
the applet recomputes foldings, free energies and reevaluates contraints
Parameters:
seq: the sequence
Returns
a boolean, trueif successful, falseotherwise (the operation may fail, if seqcontains invalid characters for instance)

set_tracked_indices(marks)
sets all black marks in the puzzle
Parameters:
marks: an array of indices of bases to be marked
Returns: nothing

get_full_sequence(index)
Parameters:
index: 0-based index of the state to be queried
Returns:
a string representing the full-length sequence, including the oligos in the case of a multistrand puzzle

get_locks()
Parameters: none
Returns:
a 0-based array of booleans indicating locked (true) and mutable (false) positions in the puzzle

get_targets()
Parameters: none
Returns:
an array of objects describing the structural constraints of each state in the puzzle (details added later)

get_native_structure(index)
Parameters:
index: 0-based index of the state to be queried
Returns:
the dot-bracket representation of the predicted MFE for the queried state in the puzzle (i.e. the native fold)

get_full_structure(index)
Parameters:
index: 0-based index of the state to be queried
Returns:
the complete dot-bracket representation of the predicted MFE, including oligos, for the queried state in the puzzle (i.e. the native fold)
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
This should in any find and replace wordprocessor and or spreadsheet as a macro:

RSS-reporting sequence style for switches.

Search and replace to generate Brourd's sequence reporting style for switches (for his algorithm).

1) Get sequence.

2) Get dot-bracket state 1.

3) Get dot-bracket state 2.

4) Get target patterns FMN, Tryptophan, Threonine, Arginine, oligo's, new.

5) Get indicator signal patterns, Ms2, kissing loop, reporter, new.

6) Split target and signal patterns in halves, labeled target or signal 1 and 2.

7) Find & replace target 1 nt's with F1 in sequence.

8) Find & replace target 2 nt's with F2 in sequence.

9) Find & replace signal 1 nt's with M1 or (K1, R1, etc.) in sequence.

10) Find & replace signal 2 nt's with M2 or (K2, R2, etc.) in sequence.

11) Color each group of F1, F2, M1, M2 a separate color in sequence and dot-brackets states 1 and 2.

12) Observe state 2 dot-bracket for unlabeled and uncolored open and closed parenthesis pair groups.

13) Observe state 1 dot-bracket for interaction with any unlabeled, uncolored pair groups from state 2.

14) Where unused, uncolored pair group of state 2 binds in state 1 with any previously labeled group label that pair group of state 2 with label of state1 name as F1', F2', M1', M2' (compliments).

15) Color each F1', F2', M1',M2' found with separate color.

16) Observe state 1 dot-bracket for any unlabeled uncolored open and closed parenthesis groups.

17) Observe state 2 dot-bracket for interaction with any unlabeled, uncolored pair groups from state 1.

18) Where unused, uncolored pair group of state 1 binds in state 2 with any previously labeled group label that pair group of state 1 with label of state2 name as (Name)' compliment.

19) Color each new group a separate color.

20) Repeat until all open and closed parenthesis groups can be labeled.

21) Any remaining nt's label N.

22) Condense each named group into a single entry of the form "[label]". Hyphenate between groups.

23) [N] groups may be removed.


This is the BSS representation of the switch string.

An example: [T]’’-[M]’-[T]’-[Trp]-[Ms2].

Another example: [F1]-[M1]’-[M1]-[M2]-[F1]’-[F2]’-[F2].


23) Observe state 1 and state 2 group interactions for each state separately and report as [group1|group2], [group3|group4], etc. for that state giving string and state representation as:

switch type [T]’’-[M]’-[T]’-[Trp]-[Ms2]

state 1: [M]’|[Ms2], [T]’|[Trp] and state 2: [T]’’|[T]’, [Trp], [Ms2].

prepared by Chris Couteau, aka eternacac 1April2017
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes

RSS switch reporter form

make & use find/replace macro


Inputs:

RNA switch sequence:

state1 structure:

state2 structure:


Target sequences used:

Same State Tryptophan A: CGAGGACCGGUACGGCCGCCACUCG

Same State Theophilline A: GAUACCAG & CCCUUGGCAGC

Same State Arginine A: GAAGGAGCA & CAGGUAGGUCAC

Exclusion Tryptophan A: CGAGGACCGGUACGGCCGCCACUCG

Exclusion Theophylline A: GAUACCAG & CCCUUGGCAGA

Exclusion Arginine A: GAAGGAGCG & CAGGUAGGUCGC

Same State Tryptophan B: CGGCCGCCACU & GGACCGGG

Same State Theophylline B: GAUACCAG & CCCUUGGCAGC

Same State Arginine B: GAAGGAGCG & CAGGUAGGUCGC

Exclusion Tryptophan B: CGGCCGCCAUU & GGACCGGG

Exclusion Theophylline B: GAUACCAG & CCCUUGGCAGC

Exclusion Arginine B: GAAGGAGCA & CAGGUAGGUCGC


Signal sequences used:

Ms2 aptamer: ACAUGAGGAUCACCCAUGU Alt_Ms2 forms available

Kissing loop: AGUGAUGUU

Reporter 1:

Reporter 2:


workspace & output


RNA sequence:

state 1:

state 2:

RSS switch type:

state 1:

state 2: