Simple rating system for synthesis candidate selection

  • 3
  • Question
  • Updated 8 years ago
  • Answered
Hello all,

For past 1 month we have been discussing a new lab synthesis candidate selection system - "Elo" rating system and ran a beta test of the system

http://getsatisfaction.com/eternagame...
(A link to elo rating system discussion)

Recently, there was a discussion among devs about a simpler version of selection system, a system that will just ask you to "guess" a synthesis score of a given RNA, instead of comparing two and you are rewarded based on how accurate your prediction was.

http://getsatisfaction.com/eternagame...
(aldo has also discussed the similar idea in this post)

An advantage of this system is that we'll be getting much more direct and fine grained information from you (players) to select synthesis candidates. This might lead to better performance overall.

A possible disadvantage is that it's harder for new players to exactly "predict" what scores will be than voting or picking the better between 2 (elo rating system).

What do you think of the system?

As for the implementation timeline, this system is fairly simple so we can safely assume that both elo system and the new rating system will settle within same amount of time.
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes

Posted 8 years ago

  • 3
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
Sounds interesting, but I would like to see the results of the elo 1 before going on to this elo2.

I also cannot help but wonder, if the suggestion of an entirely new rating system here, at this time, is an indication that elo 1 may seem like it is not working as hoped, in the dev's esitmations. I hope this is not the case, but the timing seems to suggest it.

In short, I think it may be better to finish chewing and swallow before taking another bite.
Photo of Matt Baumgartner [mpb21]

Matt Baumgartner [mpb21], Alum

  • 128 Posts
  • 33 Reply Likes
d9, it is not so much that elo1 is not working, it's that we want to make sure we have the best system before we roll it out. Because having a constantly changing system is confusing to players and make data collection and analysis more difficult.
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
Hi matt, if we try this "simple" prediction system now, it will mean we will have three systems going simultaneously (normal voting, elo, and prediction system) now THAT is confusing... especially since we have not even seen any result of elo yet to judge how it is working or how it would change the designs sent to synthesis. I agree with you; let's not do anything confusing to players; a third test running simultaneously would be. One test should be completed and be evaluated before starting anything else new.
Photo of alan.robot

alan.robot

  • 91 Posts
  • 36 Reply Likes
The way I see it, the goals of the ideal voting system would be to provide some consistent player-enforced quality control to weed out known loosing strategies, while simultaneously making it easier for new, innovative designers to get noticed and synthesized, even in a sea of hundreds of entries.

Simple voting is intuitive, but hard indeed to get noticed if you do not have a reputation for being a good designer, plus there are too many designs to choose from ("snowballing"), so the safe bets get all the votes. I saw several promising designs that never got more than 1 or 2 votes as they were from new players.

In the past, I have also seen all GC designs with high votes, and the only thing I could do to compensate was vote the next-lowest reasonable design up to try and bump the GC down in the rankings.

With Elo 1, I could consistently pick other designs over the all GC one, which is probably the easiest comparison decision to make. But I can see that too many pairwise comparisons may be needed to get a statistically significant ranking from all participating lab members as the number of submissions approaches triple digits, and this is the central limitation of the system. Unless everyone does hundreds reviews, I don't think it will scale well.

With Elo 1 + the comment system, the advantage is that designers I've never heard of have their design pop up randomly on my screen, and I can leave them some feedback to point them in the direction of something that would stand a good chance if synthesized. I never would have looked at them if it weren't for the random aspect of the reviewing system.

The idea proposed above would essentially allow for negative confidence to be expressed as well as positive confidence in a design (sort of like negative voting). This is an interesting idea, it would be neat if players to accumulate some sort of ranking on how accurate their scoring assignments are. There could be some weighting involved too, so that extreme predictions are weighted by how experienced a player has been in past voting rounds (like karma in slashdot or something).

I still think reviews should be randomly assigned, it's the only way to guarantee new designs get noticed.
Photo of mat747

mat747

  • 130 Posts
  • 38 Reply Likes
I agree with D9 in that we need to see the results from Elo1 beta before going on to Elo2.

"guess" a synthesis score of a given RNA"
Yes that system would be a lot harder for new and current players and the time that would be needed/required to analyse would be increased too.
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
I would hold off this idea at this time. Even though I can see the potential benefit of it, I think we're missing evaluative outputs that we can use as measures. So the benefit would be muted.

Due to the nature of the fact that the field of RNA is so new, so to speak, I think you will have to resolve to the fact that changes will always be neccessary to keep up with the latest developments. The best system would be one that is up for review every 6 months, unless there's a new measure that we can immediately put to use and is an easy win.

I think now that we're moving to more quanitifiable voting systems, we should be careful. I wouldn't be surprised if there is quite a wide disparity in how elo1 is shaping the results, even amongst the top players. (By the way I'd love to see how elo is currently performing.)

To push for another system change, into more unknown territory, I think will just lead players more into relying onto dotplots and webservers like RNAfold. Which I think blunts the effect of using human spatial awareness and recognition. I don't know about other people, but I have trouble determining whose design is the best out of Ding's, Mat's, mine and d9's. Much less what it'll actually score. So I don't blame people for only voting for the designs which are already leading in votes aka snowballing.

Now if I played the inverse card, I could quite comfortably predict what an all GC design would score. Would I get full points for that, or are you going to not make it allowable? There are designs that we know are going to fail, technically the guess would be correct.

On a totally different side note, which this thread actually popped into my head, what about if one week your design made it into the top 8 for synthesis, that means for the following round, all your designs would have to sit it out? Like you could still submit designs, they just wouldn't be in the running for synthesis.
Photo of aldo

aldo

  • 35 Posts
  • 2 Reply Likes
I don't know about other people, but I have trouble determining whose design is the best out of Ding's, Mat's, mine and d9's. Much less what it'll actually score.

But certainly you have some idea of the ballpark of their scores. It doesn't have to be exact, just an informed estimate. As far as synthesis selection is concerned, what matters is not the relative scores of the top eight designs but rather the fact that they are all judged to be better than the rest.

Now if I played the inverse card, I could quite comfortably predict what an all GC design would score. Would I get full points for that, or are you going to not make it allowable? There are designs that we know are going to fail, technically the guess would be correct.

If enough other people predict it will fail, it won't be selected for synthesis, so no one will get points for it. If on the other hand enough people think it will do well that it gets selected for synthesis, and you're among the few who correctly predict it will fail, you probably deserve the full points.
Photo of wisdave

wisdave

  • 27 Posts
  • 1 Reply Like
As anew player, I fully admit to snowballing. I simply don't know what to do.As a ranked in the top 20 players, I have a good grasp on creating puzzle solving designs. But, I am baffled by what makes a better design - low, high, or medium energy? More or less of certain nucleotide pairs? Short of ta=king classes here at UW - Madison, what would you suggest? I'm afraid I'm just causing more damage than contribution in the lab. Thanks.
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
I think the main thing to do is look through past lab results to see what has been tried and what worked or didn't work. And read old posts on GetSat.

It also helped me in the beginning to do some modifications of already-synthesized designs rather than try to design my own from scratch. That way you can see which parts failed and try to think of ways to fix them, it's a good way of getting a sense of what does and doesn't work that only really comes through practice (which we're all still working on). Plus, I've noticed that especially in later rounds of a shape, new players are a lot more likely to get a design voted for synthesis if it's a modification of a design that did fairly well already (I know that's how I got my first two voted up).
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
Hi wisdave - also, take a look at the table at the end of this post to get a rough idea of past successful percentages of each kind of base-pair:

http://getsatisfaction.com/eternagame...

Good Luck!

-d9
Photo of alan.robot

alan.robot

  • 91 Posts
  • 36 Reply Likes
Wisdave - this may be totally facetious/irrelevant comment for you if you are fully invested in another field of study, but I can vouch that if Tom Record is still teaching biophysical chemistry at UW, you might try sitting in when you can because he's the bee's knees in biochemical thermodynamics which is ultimately what this game is all about :-)
Photo of JRStern

JRStern

  • 42 Posts
  • 2 Reply Likes
I'm with dave, I haven't a clue as to what you even have in mind about guessing a score. What's a score anyway, points out of 100? How is that computed? For that matter, do you go back after the fact and rate our submissions, show the scores? I know we get lab rewards based on the scores. I've played most available puzzles, but still don't really grok the lab side at all.
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
@JRStern - rhiju (who runs the lab where our RNA designs are synthesized and tested) explains the scoring system in his response to this thread: http://getsatisfaction.com/eternagame...

I tried to write out a layman's version, but it ended up way too long and no clearer than rhiju's explanation :)

As far as the scores of designs that aren't synthesized go, right now they're just assigning each one the same score as the design synthesized in the round it was submitted in that has the most nucleotides in common. Those scores are what the lab rewards are based on. I'd take them with a grain (or a bowlful) of salt, since a design can share very few nucleotides with any of the synthesized designs but still has to be assigned a score -- I've seen designs whose "closest" synthesized design shared only about half the nucleotides.

I think that part of either proposal for lab reform is doing away with lab rewards for "closest" sequences - in Elo we'd be rewarded based on how accurate our comparisons were only between two designs that both get synthesized, and in this rating system I think we'd only be rewarded for how close we guessed the score of designs that are actually synthesized.
Photo of aldo

aldo

  • 35 Posts
  • 2 Reply Likes
Since apparently there are lots of doubts as to whether this would work or would be preferable to the alternatives, and since it's such a simple system to implement (i.e. add a "Predict" button to the design info dialog, a "My Prediction" column to the lab table, and a formula to assign points for predictions), why not roll it out just as an extra way to earn points at first and then look at the results to see whether they could also be used to select designs for synthesis? That would also allow you to try out various selection criteria (highest average prediction, highest median prediction, highest Elo rating after automated pairwise comparison, etc.) on real prediction data before settling on one to use for actual selection.
Photo of wisdave

wisdave

  • 27 Posts
  • 1 Reply Like
Thanks to everyone for the excellent suggestions.
Photo of wisdave

wisdave

  • 27 Posts
  • 1 Reply Like
Ok. I played around with D9's spreadsheet and looked at anything synthesized at 95 and above. I also noted the melting points and energy levels. I modified one of my designs to fit within thse parameters and took into account the comments about repeating patterns of nucleotides. I think it looks much better now. Unforunately, I'll have to wait to the next round as I have used up my three solutions. Again, thanks for the help.

@ Alan.Robot - Thanks for the comment, but I'm at the tail end of a career in business. It would be a couple of years before I retire and could take a few classes. I just stumbled on this a couple of months ago and was taken in by the possibility of designing RNA. I might get one to synthesize yet. I'll keep working on it now that I have a few tpis.
Photo of aldo

aldo

  • 35 Posts
  • 2 Reply Likes
Unforunately, I'll have to wait to the next round as I have used up my three solutions.

You can delete one of them if you want, you just have to unvote it first. Just make sure you copy the sequence and save it somewhere before deleting in case you want to refer back to that design again later.
Photo of alan.robot

alan.robot

  • 91 Posts
  • 36 Reply Likes
@wisdave - congrats on your upcoming retirement! I realize it was not the most practical suggestion, nor am I implying that people should have to take courses to design, I just wanted to convey that you have an unusual resource in your backyard, as the UW biochemistry department is really a standout department, and I can say that never having attended there but having interacted and followed the work of many of the profs there and their students who are all doing amazing work.

FYI UW-Mad allows seniors to "guest audit" classes for free with the instructor's permission:
http://www.dcs.wisc.edu/info/audit60.htm

But barring that, here are some online materials you might find interesting right away:

First is a video lecture describing, in broad terms, how the eteRNA folding algorithm works:
http://echoserver.sinc.stonybrook.edu...

and second is a powerpoint introduction to RNA structure:

http://www.vadlo.com/b/trk?uid=a5e18e...

enjoy!
Photo of aldo

aldo

  • 35 Posts
  • 2 Reply Likes
@alan.robot: According to slide 19 of the powerpoint, apparently what we call a "bulge" is just a small internal loop, and an actual bulge is something else (what I would have called a "kink" if I hadn't read this powerpoint).

Another illustration: http://www.nature.com/nrm/journal/v5/...
Photo of alan.robot

alan.robot

  • 91 Posts
  • 36 Reply Likes
Sharp eye aldo, I completely missed that! And being precise with the terminology also helps you match up with the right table to look at in the energy terms:
http://rna.urmc.rochester.edu/NNDB/tu...
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes
We got some preliminary results from the new lab on "The Star."

http://eterna.cmu.edu/news/393375
Photo of wisdave

wisdave

  • 27 Posts
  • 1 Reply Like
Thanks, aldo. Done
Photo of aldo

aldo

  • 35 Posts
  • 2 Reply Likes
I've kept track of my own predictions for a few previous rounds out of curiosity. Now that Jeehyung has released results from the pairwise comparison beta, I've gone and plotted my predictions vs. actual scores and gotten a Kendall tau rank correlation coefficient of 0.511. That seems pretty good to me, especially since I'm only one person and am nowhere near the savviest lab participant. In addition, my predictions have gotten more accurate in successive lab rounds, which suggests that I'm learning how to predict better.

The Kendall tau I got probably can't be directly compared to the one Jeehyung calculated for the beta results for a number of reasons, but it does seem to bode well for a prediction-based selection system. I've made predictions for Lab 201 Round 1 as well so I'll have an update when the synthesis results come out and compare them to the beta results for that round (which should be a fairer fight since the beta will only have to deal with one round worth of designs instead of four). The best thing would still be to implement predictions as just a way to earn points first so we can get data on it from more than just one player.

Here are my predictions (left column) and the actual synthesis scores (right column), along with the mean absolute error (MAE) and Kendall's tau for each round:

Lab 103 Round 3
90 78
60 88
75 15
75 90
85 86
80 73
10 10
85 87
MAE = 15.63
tau = 0.148

Lab 103 Round 4
88 91
94 96
80 97
93 95
94 94
94 97
70 78
70 68
MAE = 4.63
tau = 0.511

Lab 104 Round 4
96 92
93 89
96 94
88 93
94 91
85 87
97 92
93 88
MAE = 3.75
tau = 0.491
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
I wonder what the correlation will be for the first round of a shape though. Are you keeping track of Round One of the Bulged Star? I have to admit that my predictions for this one vary depending on my mood, sometimes I think we'll nail it and sometimes I think we'll average around 70 ;)
Photo of aldo

aldo

  • 35 Posts
  • 2 Reply Likes
Lab 201 Round 1
86 83
85 89
92 87
80 79
85 90
91 84
94 93
90 90
MAE = 3.25
tau = 0.370

Over all four rounds I made predictions for:
MAE = 6.81
tau = 0.503
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
@aldo, shouldn't there be results for every round? From Round 1 up to Round 4?
Photo of aldo

aldo

  • 35 Posts
  • 2 Reply Likes
@Berex, I only made predictions for the four rounds above. I stopped making them when it looked pretty certain we were going to go with the pairwise comparison method, then I got curious again lately.
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
@jee regarding the Elo Ranking results, I'm wondering if its possible to just have a little bit more data.

1) How many times each design was involved in a pairing comparison?
2) How many contradictions there are? Like if I compared designs A, B and C. Where I voted A is better than B, then voted B is better than C, And then I voted C is better than A. Sort of like an error factor?

Interesting Elo facts
Average Player went through 22 Games and completed 19 Comparisons.
And about 16% of the Games were skipped. Most probably due to indecision or not willing to compare.