CSV File of Complete Lab Design Submission Data for Download

  • 6
  • Idea
  • Updated 7 years ago
Hi to all the Devs!

Many of us EteRNA addicts are also "spreadsheet nuts." I know myself and at least a few others would love to be able to download the contents of the complete Lab Submission Data Listing (complete with full sequence data) into a spreadsheet, so that we can crunch away to our heart's content without having to manually input the data (which would be completely prohibitive with the current number of submissions).

What say Devs? Can a CSV file of the Lab Data be auto-constructed and auto-updated which continually reflects the current state of the data in the online Lab submission window? And can this CSV file then be made available to download?

Thanks, and Best Regards,

-d9
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
  • happy

Posted 9 years ago

  • 6
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
I wouldn't mind having the library of puzzles as well.
Need a control dataset to figure out if there is an optimal formula.

Thanks
Berex
Photo of Pytho

Pytho

  • 2 Posts
  • 2 Reply Likes
Did you have a look on this file?
http://eterna.cmu.edu/sites/cached_cs...
I think it's exactly what you wanted. (For the current lab)
Photo of Chris Cunningham [ccccc]

Chris Cunningham [ccccc]

  • 97 Posts
  • 13 Reply Likes
Whoa ! Yes, tell us!

If I have a theory like "there is a correlation between how often Gs are adjacent to each other and how well the RNA folds," it is hard to test it without the list of sequences.

Of even more interest is the SHAPE column -- how do we interpret that? It looks excellent.
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
Hi Pytho, could you please direct us as to how to grab that data for ourselves?
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
Berex - if you click the link Pytho gave it downloads an updated file in comma-separated-variable format (csv). You should be able to open it in a spreadsheet program (OpenOffice, Microsoft Excel, ...)
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
Most excellent~! Thank you Ding.
I was assuming it was a static file. And wouldn't have any of the current round 4 submissions! :)
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
Hi Pytho,
I've noticed with the new lab being put up, this link no longer provides me with the latest submissions. Can you please detail how we can get this ourselves?
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
Pytho, how did you get that data?
According to that data, they've been synthesizing a lot more submissions....?

Most interestingly, jeehyung's submission got a 94 and so did a christmas tree.... Both in round one.

If anyone wants to find them faster, its nid 26607 and 27014.
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
Jee. I apologize, but I cannot see HOW this all-G-C design can possibly be closest to Donald's 94% winner, when all the other all-G-C designs received the absolute minimum score. It seems something must be wrong somewhere. Please see my last entry above; edits were added since you read it.
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
My guess from looking at the numbers is that the "similarity" is based very simply on the # of nucleotides two designs have in the same position. So if Christmas Tree had the same orientations of GC pairs at all the positions that donald's design had GC pairs, that would lead to relatively high "similarity", despite the fact that other measures like overall % of different kinds of bonds/nucleotides, melting point, and free energy might be very different.

Another thing to note is that I don't think any actual "christmas trees" were synthesized in that particular round, and I think the comparison only goes within a round.
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes
Ding's right - comparison only goes within rounds. That Christmas tree was submitted in round 1 and there was no synthesized GC pairs design in round 1.

As you can see it differs by 43 bases from the 94 design, which means about half of bases are different, basically meaning they are completely different RNAs. The tree was completely different from all 8 synthesized designs but it happened that 94 design was relatively closest.

This scheme was designed so that every design (even non-synthesized) in the lab at least gets some score, and people can at least get some rewards by participating. This obviously isn't an optimal way of evaluating. We are actively looking for better way to evaluate non-synthesized designs along with a better way of selecting candidates.
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
@Ding: I fear you may be correct & I say "fear" because that would be a serious flaw, if it means that the same design could score 10% (minimum) in one round and 94% in another, just because of a positional similarity of G-C pairs to a very good design, irrespective of the quality of the rest of the design. I am certain the developers would not want this perception of possible inequities in the scoring system, or possible flaws in the similarity algorithm - to proliferate. I hope something can be done.
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
@d9, jee:

I agree with d9 that it's a pretty serious flaw. A couple "quick and dirty" fixes might be:

1) don't limit comparison within rounds; compare to previous rounds as well. This won't change things like what happened with Christmas Tree getting 94, since it was first-round, but it might mean that in future rounds other christmas trees will be compared to the ones that have already failed rather than to less similar designs if we have another christmas-tree-free synthesis round

2) only reward designs that are similar above a certain limit. Don't get me wrong, I liked getting points for my Round Two submission, but it shared only half the nucleotides of its "closest" match and was otherwise very different (and more similar to other candidates in certain ways). Getting a reward based on synthesis of a completely different RNA felt a bit cheap ;)
Photo of AnticNoise

AnticNoise

  • 11 Posts
  • 3 Reply Likes
How are all those synthesis scores calculated...? I'm very curious about that.
Photo of Christopher VanLang

Christopher VanLang, Alum

  • 30 Posts
  • 9 Reply Likes
This itself should be a question on getsatisfaction.
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes
This has been added to the task list (case #399). Thanks for the idea!

EteRNA team
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
Very glad this is now in the list to be under more intense developer scrutiny, since, under the current system, not only can a Christmas Tree receive a 94% for being "relatively" close (if 43 out of 85 can be considered "relatively close") to a high scoring winning design, but perhaps even worse, the HUNDREDS of players who receive relatively high scores for designs which may not actually merit them (had they actually been synthesized)... may then erroneously deduce that their (perhaps unsound) design was "really" a high scorer on its own merit, and then take forward the wrong lessons to repeat in future rounds, duplicating weaknesses not revealed or made apparent because of the use of the "closest-to" scoring system..
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
Just bumping this comment, wondering if there have been any further thoughts on the subject.

I notice that in round 5, a handful of 30+ GC designs scored 94+ since it was overall a high-scoring round with no actual GC-heavy designs voted for synthesis.
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes
Ding & dimension9

We are redesigning the scoring system with the new synthesis candidates selection algorithm. At the very least, RNAs that differ by more than certain percentages will be considered "different" and won't get rewarded as it does now.
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
thanks jee, that's good to hear :)
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
The csv link doesnt give me the submisssions for the latest lab.
Can somebody point me to the new file location please?

Thank you!
Photo of jandersonlee

jandersonlee

  • 554 Posts
  • 129 Reply Likes
While we wait for a full EteRNA 2.0 database interface something that would spit out a .CSV file of lab results would be a big help to those of us trying to mine the results for ideas. Even something that had just the following fields:

id,score,melt,FE,sequence,targetshape,bonds

where:

Id is the submission number
Score is the synthesis score
Melt is the estimated melt point
FE is the computed free energy
Sequence is the RNA sequence e.g. "AGCAAAGCA"
TargetShape is the secondary shape string e.g. ".((....))."
Bonds is the bonding result as explained below

Bonds could be something like "0110000110" if it is just a binary 0=unbonded 1=bonded estimate or something like "0891012871" using 0..9 for a 10-way binned estimate of the bonding of each nucleotide.

Other fields that might help are a lab-id to distinguish between different labs, and a best-guess secondary-shape since the system seems to predict how the sequence actually folded. It also wouldn't hurt to add the CG, GU and AU counts since you have them.

Either something that would spit out the results for a given lab (at the bottom of the view results page?) or one .csv file to rule them all (including all lab results) would suffice. We can do without a complex search interface for now. Even a more current single static file would do if that's all you have time for. (The old cached link doesn't seem to be there anymore.)

Thanks!
Photo of jandersonlee

jandersonlee

  • 554 Posts
  • 129 Reply Likes
Oh, and for "switch puzzles", you could use two separate result lines, with an extra field to distinguish between (FNM) target molecule present or not.
Photo of jandersonlee

jandersonlee

  • 554 Posts
  • 129 Reply Likes
Is this on the implementation list yet? I sure wish I could get a .CSV file for the lab submissions/results. Thanks!