Trouble aligning experimental results

  • 1
  • Question
  • Updated 3 years ago
  • Answered
  • (Edited)
I'm having trouble aligning some of the experimental results. I've been assuming that all of the synthesized results had a missing 5-base GGAAA prefix, but it seems like that may not always be the case (player projects?). Is there something in the lab header data that should tell me if it has the prefix or not, or how long it is?

In particular, "nid":"17320","soln":22140 works for my code, but "nid":"751755","soln":752853 does not.

Thanks.
Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes

Posted 6 years ago

  • 1
Photo of Brourd

Brourd

  • 435 Posts
  • 79 Reply Likes
GGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGUCAGGACACGGAAACGUGACGCGACCAAAAUCAGCGUUCGCGCUGAUAAAAGAAACAACAACAACAAC Would be the full sequence for "nid":"751755","soln":752853. Unfortunately, I'm not entirely sure how they determined the dangling end sequences, or if they all happen to be poly adenine sequences.
Photo of JR

JR

  • 238 Posts
  • 19 Reply Likes
I had the same problem - dumped mine in excel and it looked like one data
column was missing- I parsed sequence and shape #s and they did not line up. I removed the front and end strings
I was going to double check before posting - It may take a few days so don't
get excited about it on my behalf just yet. I am assuming it is me.
(this is data from JL's lab results query
Photo of JR

JR

  • 238 Posts
  • 19 Reply Likes
I doubled checked and am still missing 1 data point. I removed GGAAA from the beginning and there was data up to but not including the last NT of the barcode.
My example it was CC"C" so it was hard to miss.
The first data point in all my data = 1? Will verify this again.
Next I will manually match my excel data to the lab screen data (hovering mouse over sequence gives me data point and will report back any discrepancies
- a day or two
Thanks
Photo of JR

JR

  • 238 Posts
  • 19 Reply Likes
Looks like you are 6 NTs off
To duplicate lab 3105090 brourd 3182519

1, 1.0362, 1.0569, 1.5321, 0.942, 0.9283, 0.5388, 0.1147, 0.2628, 0.0193, -0.0017, 0.0455, 0.2156, 0.0999, 2.4576, 0.2383, 0.0433, 0, 0, 0.1081, 0.2368, 0.086, 0.0001, 0.0598, 0.0214, 0.2614, 0.9117, 0.3176, 0.2537, -0.0055, 0.0621, 0.0207, 0.0414, 0.046, 0.0665, 0.0869, 0.0046, 0.0148, 0.2759, 1.4084, 0.1594, 0.1321, 0.0199, -0.0536, 0.3283, 0.084, 0.0151, 0.2684, 0.8529, 0.4854, 0.1353, 0.1891, 0.2073, 0.0104, 0.169, 0.0188, 0.0563, 0.4931, 1.2565, 0.4162, 0.2063, 0.1439, 0.1256, 0.3252, 0.213, 0.3881, 1.0919, 0.9769, 0.874, 0.3383, 0.3323, 0.0702, 0.0611, 0.1155, 0.0495, 0.217, 1.3963, 0.4671, 0.865, 0.6511
.,(,(,(,(,(,(,(,.,(,(,(,(,.,.,(,(,(,(,.,.,.,.,),),),),.,.,),),),),.,(,(,(,(,(,(,(,.,.,.,.,),),),),),),),.,),),),),),),),.,.,.,(,(,(,(,(,(,(,.,.,.,.,),),),),),),),.
A,G,A,U,G,C,U,C,A,G,U,C,C,A,A,G,A,G,C,G,A,A,A,G,C,U,C,A,A,G,G,A,C,A,G,A,U,C,G,U,C,G,A,A,A,G,A,C,G,A,U,C,A,G,A,G,C,A,U,C,A,A,A,G,G,U,A,C,U,C,U,U,C,G,G,A,G,U,A,C,C,A,A,A,A,G,A,A,A,C,A,A,C,A,A,C,A,A,C,A,A,C
This is data with GGAAA removed - then go to lab 3105090 - lab Brourd - 5th from top go over nts and hover until #s match is made. You should see problem
Let me know what you think or if it is me.
Thanks.
Photo of JR

JR

  • 238 Posts
  • 19 Reply Likes
And while we are at it, if you create the file that gets downloaded from the list
screen I will make sure all 3 sources are in sync. It should save you a few hours of work which I don't mind doing.
Thanks.
Photo of Eli Fisker

Eli Fisker

  • 2222 Posts
  • 483 Reply Likes
Hi JR, it is not just you. I was wondering too why the data suddenly looked a lot different in Omei's tool:



It looks like the yellow loop coloring is pushed 5 bases to the left.

However the data starts with GGAAA at position 6:


So the data we receive are changed.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 963 Posts
  • 303 Reply Likes
I just checked Eli's example for myself, and it looks ok now. Have the other problem cases cleared up?
Photo of JR

JR

  • 238 Posts
  • 19 Reply Likes
From the chat sounded like Jee is busy working on other important stuff.
I posted this so when they get a chance, the dev's know where to look.
I'll wait until they post a reply before running all my numbers again.
Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
When I do a /get/?type=puzzle&nid=$NID I see a field called usetails that is "1" for the lab puzzle and null for the player project. I'll try using this to determine whether to assume a GGAAA prefix on the sequence and SHAPE or not.
Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
It seems to help so far...
Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
Thanks to Dennis9600/ElNando888 for the tip:

@Dennis9600: Something new is going on with "tails" of the sequence in the labs. Either 'GG' or 'GGAAA' must be put in front of every design sequence returned by the html GET "http://eterna.cmu.edu/get/?type=puzzl...=". ElNando888 let me know that the way to get the right one is to look at the JSON field "usetails". It always used to be a "1" and the correct starting tail always used to be "GGAAA". Now if you see a "2" in "usetails", you need to use the new sequence "GG". (The 21 nucleutide ending tail has not changed.)
Photo of nando

nando, Player Developer

  • 388 Posts
  • 71 Reply Likes
Just to be pedantically precise, the 3' tail is 20 bases long, not 21. The 21st base, a locked and unpaired A, is part of the barcode, which has this structure notation: (((((((....))))))). [notice the last dot]
Photo of jandersonlee

jandersonlee

  • 549 Posts
  • 122 Reply Likes
Having problems aligning once again. It seems that there are six labs where I have trouble aligning the sequence with the secstruct. Here are the lab nids and the counts of solns that seem to misalign:

{'3414850': 180, '3414839': 250, '3414828': 243, '3622020': 128, '3282859': 16, '3414861': 130}

For now I'm ignoring them in my "data mining". I may come back to it later to try and figure out what's not quite right.