New synthesis candidates selecting system

  • 30
  • Announcement
  • Updated 7 years ago
Dear players,

In last few rounds after the public launch, we have witnessed a few shortcomings of our current voting system for selecting synthesis candidates. Because now we have hundreds of lab submissions now, it is extremely hard to go through all designs to pick out the best one. In most cases, people are intimitdated by the number of designs they have to look through, and often decide to vote on designs that already got lots of votes ("snowballing") or only look at single metric (such as free energy) to vote. Devs had a meeting about this today and came up with a conclusion that the voting system is not suitable for the system like us, where we have massive number of candidates.

Instead, we are now thinking of applying Elo Rating System for synthesis candidate selection. If you saw the movie "The social network", you'll recognize this system right away as this was used for the FaceMash system. In this system, users are continuously asked to pick the better of 2 candidates. Each user decision create a partial ordering of candidates, and the system tries to come up with a total ordering out of all partial orderings trying to minimize inconsistency. In the end, we will be synthesizing top 8 designs in the total ordering.

Instead of "voting", there will be a "review" button, where you'll be asked to pick the better of 2 randomly picked candidates. In the review interface, you'll be able to do full comparison of 2 candiates - you'll be able to see their statistics and interactively play with both deigns. You can do as many "reviews" as you want, and you'll be rewarded by how many "correct reviews" did.

The system has many great advantages. First, every design will be reviewed by someone. We can setup a system such that designs that didn't get reviewed are more likely to be chosen for random review candidates and it'lll make sure very design gets reviewed. Second, one-to-one comparison will allow players to make more in-depth decision than having to go through hundreds of designs. Third - the quiz like quality of the review will stimulate people to learn & ask more before they make decisions.

The system does have few issues. The biggest issue is that it would reduce social aspect of candidate selection. For example, it'll be now hard to tell people "vote for my design ABCDE, I think it's cool" in chat to promote your design. We plan to solve this issue by still having the "design browser" so you can still browse through every submitted designs and review a specific design, and even start doing "reviews" relevant to that specific design (i.e, fix one candidate of the review to be that specific design). Also, we could allow people to leave comments on each design when they review, so people who come later can see them.

The details still needs to be worked out. For example, how are we going to reward people based on their reviews? - do we say a review is correct or wrong only if 2 candidates in the review are synthesized and can be compared? If not, how can we rate reviews that involve non synthesized designs? We are still working on these questions and it may take some time for us to come up with a final system, but we wanted to throw this idea out to EteRNA players and see what players think.

EteRNA team
Photo of Jeehyung Lee

Jeehyung Lee, Alum

  • 708 Posts
  • 94 Reply Likes

Posted 8 years ago

  • 30
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
wow, big changes - got to "see before say" - when is this going to start?
Photo of Deety42

Deety42

  • 1 Post
  • 2 Reply Likes
This sounds like a fabulous idea to try. I think that it would be helpful to allow comments on all designs and to make those comments available to whomever is selecting between two designs. That will allow the selector to benefit from whatever observations others have had a chance to make. Also, it would be interesting for the selector to be able to make comments as well on each design...i.e., I didn't choose this because it looked like there were too many patterns that might re-align and match up and deform the shape. That would help people whose designs are not selected understand what to work on...
Photo of alan.robot

alan.robot

  • 91 Posts
  • 36 Reply Likes
This is a great idea, more like a crowdsourced peer-review system and less like a popularity contest. One suggestion - there should be a maximum number of reviews that a lab member can choose to write (but perhaps no limit on randomly assigned reviews). Otherwise, one person could still unduly manipulate rankings by choosing to review every pairwise combination of their favorite submission and other submissions. Alternatively, if the number of reviews is fixed (like mod points on slashdot), it will force people to spend them wisely.
Photo of Chris Cunningham [ccccc]

Chris Cunningham [ccccc]

  • 97 Posts
  • 13 Reply Likes
I strongly disagree -- if someone is willing to put in the time to write a brief thought about every design, they should be encouraged rather than prevented from doing so. I doubt that kind of person is going to be motivated by scoring points.
Photo of aculady

aculady

  • 10 Posts
  • 6 Reply Likes
I agree, Chris. I had actually thought of suggesting a player achievement ("The Golden Thumb" or something like that) for players who consistently reviewed large numbers of designs, and another ("The Shadow", perhaps? "Who knows what twisted patterns lurk in the strands of RNA?") for players whose reviews/predictions consistently matched lab outcomes. .
Photo of alan.robot

alan.robot

  • 91 Posts
  • 36 Reply Likes
Well, lets remember how Elo works: its the scoring system is used in chess matches and it assumes that the relative rankings actually mean someone beat someone else in a concrete way (i.e. a match). The problem here is that there was no actual match, it's just an opinion (even if well-informed), and the goal is to synthesize the most representative composite answer from all the collective feedback. There needs to be some way to normalize if different people do different numbers of reviews, otherwise the sorting will inherently bias the viewpoints of those with more reviews. It's not even necessarily a manipulation issue, it's because ELO assumes there is one, true,universal ranking and that's not strictly true for something based on multiple users' intuition (although that's what we are trying to take advantage of). I'm all for unlimited comments, but if you can't normalize the pairwise rankings somehow the sorting will be inherently biased to reflect the opinions of those who do more reviews.
Photo of Christopher VanLang

Christopher VanLang, Alum

  • 30 Posts
  • 9 Reply Likes
You could imagine a Quora like system where the useful reviews get more viewtime and the unhelpful ones get voted down. Also author reputation may become a factor.
Photo of Adrien Treuille

Adrien Treuille, Alum

  • 243 Posts
  • 33 Reply Likes
Alan: We wouldn't necessarily use Elo, that's just an example. There are many methods to create total orderings from sets of partial orderings, and we need to investigate which constraints (e.g. normalization constraints) need to be satisfied by a proper ranking.
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
First take on this, (after reading up on "Elo," though with admittedly, not fully adequate familiarity) is that it operates much like a bubble-sort, but with a radically inconsistent comparison algorithm. (as different as every player who participates!)

This perception has caused "The Big Question" in my mind at this early stage to become:

"Will the individual comparison choices made by the player community - with their various understandings and perceptions of game dynamics, and their wildly differing ideas of what a successful design looks like - manage to "average out" into a true "wisdom-of-crowds," and result in a better selection of designs being eventually chosen each round?

... or will this inconsistency, being necessarily applied unevenly across the design-space result instead in a choice indistinguishable from a random "eight-dart toss", or in a choice so skewed (by whatever current evaluation theory-meme seems to be dominating thought in the "player-verse" during that week's comparisons) - that the end result each week could fluctuate wildly between a very positive diversity, and a hopelessly muddled melange of design extremes and oddities?"

On purely the positive side, though, this system will clearly enforce at least a minimal analysis even by the least so-inclined players. and will provide a much, much greater democratic view-and-evaluation opportunity for many, many designs which in the previous system would have been simply ignored or overlooked.

It should be an INCREDIBLY interesting experiment, and the more I think on it, the more I like it.

I'm looking forward with great anticipation to participating in this new system, and to seeing how it all feels once it's in progress, and to seeing how it all works out in the end each week.

-d9
Photo of xmbrst

xmbrst

  • 13 Posts
  • 3 Reply Likes
The more people discuss their hypotheses about what makes a good design, the more wise the crowd will be. Ideally the game would guide people to a forum for this kind of discussion.
Photo of boganis

boganis

  • 78 Posts
  • 4 Reply Likes
"...Will the individual comparison choices made by the player community - with their various understandings and perceptions of game dynamics, and their wildly differing ideas of what a successful design looks like..."

In what respect was the old system better regarding this issue?

I really can't see that your concern was invalid for the old system?
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
Hi boganis, I think you're right, & I agree the old system was not any better in this regard, and that my concern voiced here was indeed not invalid for the old system either; I'm just not sure that this new system (as intriguingly interesting and differerent as it seems to be) - will prove to be any better. But I am hoping very much that it will be.

For me, (I guess because I've already come to view this system as a kind of "bubble-sort"), the "sticking point" is mostly just the idea that the "compare function" in the algorithm of this system will be applied differently each time... but for all I know, this very inconsistency may prove to be both it's strength, and its brilliance.

-d9
Photo of Adrien Treuille

Adrien Treuille, Alum

  • 243 Posts
  • 33 Reply Likes
Boganis: That's a great point, and we fully agree.
Photo of mat747

mat747

  • 130 Posts
  • 38 Reply Likes
I seen you still need to work out the details, is there a time frame on when that will be done ?
I think with a total change of "current voting system for selecting synthesis candidates" it maybe wise to beta test the update before it go`s mainstream.
Photo of Fomeister

Fomeister

  • 12 Posts
  • 4 Reply Likes
I think it is the obvious method. I would strongly disagree about aloowing comments in the reviews.
While I am certainly impressed by everyones "lessons learned", it is important to remember that the goal of this experiment is to determine if humans can recognize patterns.
Not to determine if humans can calculate winning designs. Computers can calculate theoretical combinations, based upon the different metrics involved.

It concerns me that this is not being re-iterated enough.
Photo of Adrien Treuille

Adrien Treuille, Alum

  • 243 Posts
  • 33 Reply Likes
Fomeister: I very much agree with you that if sufficient metrics were known, the a computer could simply calculate them and determine which design is better. However, the metrics are unknown and that's precisely the problem. Allowing people to comment on individual designs will allow different theories on correct metrics to diffuse throughout the community. Ultimately our goal will be to "formalize" the best metrics so that they can be performed automatically by computers--but first the community must help us find them!
Photo of cesium62

cesium62

  • 8 Posts
  • 2 Reply Likes
I thought the goal of the experiement was to design some novel interesting RNA structures and, as a side effect, see if something could be learned about how RNA folds to improve RNA folding computer algorithms.

Putting comments on submitted sequences is critical. This way old-timers can say: "This is a christmas tree, it won't sequence much less fold, see documentation _here_. Don't vote for this."
I think it is critical that people with good reasons why they think a particular sequence will or won't fold correctly be able to describe those reasons on a design.

The point is not for inidividual humans to recognize patterns. The point is for the community of humans to recognize patterns. Once a new pattern has been recognized, it needs to be disseminated through the community without forcing each individual to discovert that pattern for themself.
Photo of Fomeister

Fomeister

  • 12 Posts
  • 4 Reply Likes
@cesium62 - That is precisely why comments should not be allowed. And no, the point of the project is not to create novel structures.

The "point" of the "game" can be found here, http://www.cmu.edu/homepage/computing...
Photo of cesium62

cesium62

  • 8 Posts
  • 2 Reply Likes
The point of the game cannot be found on the page that link leads to.

Got it: The reason to not allow comments is to help the spread of information through the community. Makes sense to me.
Photo of Fomeister

Fomeister

  • 12 Posts
  • 4 Reply Likes
Sure it does. Watch the video on the page.
Photo of Chris Cunningham [ccccc]

Chris Cunningham [ccccc]

  • 97 Posts
  • 13 Reply Likes
I've told a dev this before, but you guys should check out http://shirt.woot.com/Derby for an idea of how a discussion thread on every design can be a worthwhile thing.

Here's a random thread from there that has the kind of feedback I thin kwe could use here: http://shirt.woot.com/Derby/Entry.asp...
Photo of aculady

aculady

  • 10 Posts
  • 6 Reply Likes
Some thoughts we have been kicking around here at my house, in no particular order...

One of the points that has been noted repeatedly is that people are not consistently integrating the information from the lab synthesis results into their designs. I would like to see another layer of idea generation and possibly voting to address this - not a design layer, but a hypothesis generation layer. This could take a few different forms in practice, and I don't have a personal favorite at this point.

One option would be to have a set of comment fields that would pop up when someone viewed the lab results asking them to suggest why they think it didn't synthesize properly, what issues they see that may have caused particular points of failure, and why they think the parts that did work did.

One way to use the data collected this way would be to have a side window that operated like a forum post or a reddit thread. After they commented (and I think it is important to require them to present their own ideas, however minimal, first), they would have access to the comments of others about that design, and, ideally, would be able to upvote or downvote those comments and reply to specific comments of others. Systematically collecting, sharing, and evaluating post-synthesis feedback that is immediately accessible while viewing the design would go a long way toward priming people to have a hypothesis-testing mindset when creating and evaluating further designs.

Another way to use the responses generated would be to rotate them into the pairwise comparison format, if the forum approach seems unworkable or too clumsy or time-consuming. "Which of these two responses best explains why this design (failed to bond / misbonded / deformed, etc.)?"

Regardless of what action is taken regarding post-synthesis comments, I think that it would be a really good idea to explicitly evaluate submissions in the 2nd, 3rd, and subsequent rounds with respect to how well they solve problems found in the first round submissions. Instead of, or in addition to, having a simple pairwise comparison in which the player is asked "Which of these two designs is more likely to fold correctly?", I think it would be really valuable ask players to answer "Which of these two designs is more likely to fix the specific problems seen in this synthesized design?" and "Many players believe that Design X (failed / succeeded) due to __________. Which of these two designs would best test whether that was true?" I would like to see perhaps half of the synthesis slots reserved for designs that are explicitly testing hypotheses about previous lab failures or successes.
Photo of xmbrst

xmbrst

  • 13 Posts
  • 3 Reply Likes
Yes yes yes to explicit hypothesis testing! Any model based on voting up individual designs will make this hard, though. "Design A will do better than design B" is a much more powerful format for testing hypotheses than "design A will do well". But that format requires voting up a pair of designs rather than just one.
Photo of Ding

Ding

  • 94 Posts
  • 20 Reply Likes
I was very skeptical about this idea when I first read about it, but I'm warming to it.

Aculady made a suggestion in chat last night that I wholeheartedly agree with: there should also be an option to rate two designs as equal, or skip the comparison altogether.

I think adding the ability for others to comment on designs is great, not just for voting purposes but also to help us all learn to design better.

One question I have is how this system will affect designs submitted late in the week -- will there be enough data generated for a design submitted only a couple hours before the design/voting deadline to give it a fair ranking?

It'll be interesting to see how it works, and I'm feeling hopeful about it :)
Photo of aldo

aldo

  • 35 Posts
  • 2 Reply Likes
An alternative would be to have players "order" designs by predicting their synthesis scores. The designs with the highest average prediction and whose number of predictions is above a certain threshold would be synthesized. The reward for predicting would be a function of the difference between the predicted and actual synthesis score.

This method still lends itself to the randomized evaluation method since you don't have to look at all the designs to predict the synthesis score for any one of them, just as you don't have to compare all of the designs to compare any two of them. It has the added benefit that A) more evaluations will be testable against lab results (the probability of any one design being tested is greater than the probability of any two designs both being tested) and B) it should be easier to get an ordering out of a list of scores than out of a relatively sparse set of comparisons (statisticians, feel free to correct/corroborate this).
Photo of xmbrst

xmbrst

  • 13 Posts
  • 3 Reply Likes
Here's a proposal re explicit hypothesis testing (related to aculady's post):

In addition to the Elo pairs generated by the system, mix in some "hypothesis pairs" created by players. In a hypothesis pair, both designs would be by the same player, and the player would include some comments about why the pair is a potentially informative comparison.

Hypothesis pairs would be chosen for synthesis based on how much *disagreement* there was about the ranking of the pair elements, because a more interesting hypothesis is one where the answer isn't obvious.

The number of hypothesis pairs in the pair pool could be limited, with the candidate hypotheses chosen based on how well the individual pair elements did in Elo voting.
Photo of aculady

aculady

  • 10 Posts
  • 6 Reply Likes
I really like the idea of synthesizing either single designs or pairs where there is strong controversy. Controversy indicates either that there are things about the design parameters that are not understood at all under those particular conditions, or that there are two or more competing paradigms in the community regarding how things work that yield different theoretical results under the same conditions.

Edit: This would, of course, be in addition to syntheisizing some designs where there was strong consensus.
Photo of cesium62

cesium62

  • 8 Posts
  • 2 Reply Likes
per something aculady wrote: Once synthesis patterns are selected and until the synthesis results are available, we should be encouraged to assign orderings (or guess the synthesis scores) to those selections.
Photo of Fomeister

Fomeister

  • 12 Posts
  • 4 Reply Likes
And? That doesn't conflict with the new scoring system.
Discussion, whether it be about ordering or guessing synthesis scores is a great things. However, these type of discussion should go into the "peanut gallery" or hypothetical type discussions. There is collaboration, which sometimes takes the form of 3 scientists at the bar, and there is analysis.
Two different parts of the same process.

That is why I believe the new system is better, but comments should be avoided when the analysis "moment" comes. Certainly individuals will, and should take into account all discussions, but in the end the point of the project is _not_ to be a popularity contest.
Photo of aculady

aculady

  • 10 Posts
  • 6 Reply Likes
I really like the fact that we are crowdsourcing the game development to some degree, as well as the RNA algorithms.
Photo of dimension9

dimension9

  • 186 Posts
  • 45 Reply Likes
Since initial impression-airing, I have been devoting some thought to practical implementation issues, and came up with a preliminary list of questions that I felt would need to be addressed "pre-start-up:" (that is, of course, aside from the obvious need for significant interface infrastructure enhancements to be designed, programmed, and tested)

1) Since this system seems to operate much like a simple sorting algorithm, how will new entries be inserted into the sort once it is already in progress (at the bottom?, top? center of existing?)

2) How will late entries be handled in order to receive a fair exposure? (Those in the last 3 days, 2 days, 1 day; last 3 hours, 2 hours, 1 hour)?

3) In short, the above two issues, illustrate that conducting Elo Pair Review during the design creation cycle would re-produce the same issues that the current system already suffers from.

So, in reference to the above concerns, it strikes me that the preliminary implementation of the "alternating" lab system, as previously proposed here:

http://getsatisfaction.com/eternagame...

...in advance of implementing Elo, could significantly ease the transition to the Elo System.

...however, this "alternating week" system could also be adapted to significantly facilitate the new Elo Pair Review System as well, BUT this would mean the addtion of a third Lab Cycle, "Lab C," thereby lengthening the cycle by a week.

This scheme would address the above concerns, albeit at the cost of drawing out the cycle to a third week, however, once you take a look at the benefits, perhaps this will not seem too steep a price.

Number 1 above would cease to be an issue, since all entries would be received During Week One before synthesis cut off, meaning that the Elo algorithm would at least be working on a full, complete, stable data-set. During the first week of design, there would be NO Elo Pair Comparisons done; the week would be devoted totally to design creation and submission.

Then at the beginning of Week 2, after design submission cut-off for "Lab A", the Elo Pair Comparision Process would begin for "Lab A", and proceed for that whole week, while "Lab B" would simultaneously begin ITS design phase. During this second week of "Lab A's" Elo Pair Review of designs, there would be NO more design submissions for "Lab A"; the week would be devoted totally to Elo Pair Review. This would give the Elo Pair Review Process more than adequate time for the best designs to "bubble" up to the top, thereby allaying any possible concerns of inadequate pair review affecting results. Also, since all designs would have been submitted prior to cut-off the previous week, there would be no concern about the last and latest entries not receiving a fair shake in the review process. Concurrent with the Elo Pair review for "Lab A," now remember, the design phase for "Lab B" would be beginning and going into full swing.

At the End of Week 2, the Results of the Elo Pair Review for "Lab A" would be completed and would be published and the top 8 designs sent to the Lab for Synthesis. 'Lab B" would be winding up ITS design phase and getting ready to go to ITS Elo Pair Review Week.

Finally in Week 3, with the "Lab A" Elo Pair Review now completed and the Winners sent off to be synthesized, "Lab A" would close for the week during the synthesis.

Simultaneoulsy, "Lab B" would be completing ITS Elo Pair Review, and would in turn, be preparing to send ITS Top 8 Results to Synthesis.

Meanwhile, "Lab C" would then open for ITS design submission phase, and the whole process would begin again for all 3 Labs.

Here is a quick diagram of the Elo Pair-Review-Enhanced Lab Flow Proposal:



(please click on graphic for larger, clearer view)

I think the added organization, and separation of effort achieved in creating these clear-cut phases may be more than worth the cost of lengthening the cycle to a third week to accommodate the Elo Pair Review Process.

Thanks & Best Regards,

-d9
Photo of aculady

aculady

  • 10 Posts
  • 6 Reply Likes
I think that this is an absolutely excellent proposal.
Photo of Benjamin Callaway

Benjamin Callaway

  • 4 Posts
  • 0 Reply Likes
It seems like your proposal is a relatively complicated process, and not necessarily worth it as it may introduce as many new problems as it solves.
For example, I think being able to review designs while they are being created is very important to the design process, as it allows people to see the problems in their designs and fix them based on provided feedback. In your system, there is no opportunity to review the designs as they are created, and once they can be reviewed, the changes that should be made cannot be.

Personally, I think simple systems generally work better, and I don't see that your "1" issue will really be an issue. A good design, even if created near the deadline, would still bubble up to the top relatively quickly if it consistenly got positive votes.

Perhaps a combination of systems would do best, such as shortening it two a two week cycle and locking out new submissions 1 day, or several hours, before the final choices are made?
Photo of Adrien Treuille

Adrien Treuille, Alum

  • 243 Posts
  • 33 Reply Likes
A nice thing about this system is that we don't need to worry about where to "insert" new designs, nor when they're created because the ordering is well-defined no matter when solutions get compared. Of course we may want to include some sort of "stability" in the system so that newly submitted designs aren't sorted until they receive a sufficient number of comparisons. But in general, this approach may allow us to weaken the importance of "rounds:" we would simply pop the top 8 solutions off the totally ordered list and synthesize those.
Photo of Fomeister

Fomeister

  • 12 Posts
  • 4 Reply Likes
I think it bears repeating, that the only test that matters is the lab test. There is absolutely zero people in this group, or in the lab, who know the elements of a design across all shapes.

Too many people are trying to design a process, wihtout considering the goal of the project.

Again, the point of the new review process is to determine how the reviewer rates the designs comparatively.

"...problems in their designs and fix them based on provided feedback."

The design of the software controls _known_ problems in designs. Feedback from other participants should be taken for what it's worth. An opinion. That opinion should be shared and people collaborate on their designs for certain.

However, that is a distinctly separate issue from "reviews", "votes", or judging.
Photo of SpaceFolder

SpaceFolder

  • 10 Posts
  • 0 Reply Likes
I think the "Elo" method mentioned above is ill-advised. Many people will not want to take the time to compare designs. Although I believe a statistical based approach is warranted, it must be simple for people to grasp and have a "common sense" feel to it.

Here's my suggestion:

1) A computer program counts all the submissions. Let's say, for example, 100 submissions.
2) The computer program then counts how many of these submissions were submitted by people who have already had past submission(s) selected AND scored above 90. Let's say for our example...23 people out of 100 submissions.
3) These 23 submissions are automatically entered into the new trials.
4) The computer program then randomly selects an arbitrary OR pre-defined number of the remaining 77 submissions. Let's say it's predefined at 5 submissions.
5) The end result is that, for our example, 28 go for lab analysis.

This method removes any and all subjectivity and is based on past performance (the 90 or above criteria). Certainly, someone could have achieved a 90 or above based on luck and perhaps never get a 90 or above again. This could be remedied by a time constraint for how long someone remains in the 90 or above selection preference.

People simply aren't going to take the time to do lengthy comparisons. People come to this site to have some fun and help science progress...not to have a part-time, unpaid job. People like to compete on an intellectual level and EteRNA offers that opportunity. But it's not a good idea to have people like myself exiting this site feeling like getting a design selected for further analysis is essentially futile.

Here's the reality, I received a nice chunk of points for selecting the latest winner...BumpyDiggs. But I didn't put any thought into it at all...I just jumped on the snowball bandwagon, all the while complaining that the voting scheme needs to be changed.

Thanks for the opportunity,
SpaceFolder
Photo of mrrln

mrrln

  • 1 Post
  • 0 Reply Likes
But maybe people will take the effort if they score points when they vote? For example 10 points per vote, and when the lab results get back, you get 20 point for each «correct» comparrison.

Then you also get rid of the problem that people would vote for popular designers submissions just because they are popular.

Another factor that can remove popular voting is to anonymize the submissions in some way.
Photo of SpaceFolder

SpaceFolder

  • 10 Posts
  • 0 Reply Likes
If I had to choose between your suggestion and the status quo, I would choose yours. I'm not saying yours is the ultimate solution. I'm saying the status quo is VERY bad. But you can't make EteRNA feel like a part-time, unpaid job. People will lose interest if you do. Maybe EteRNA needs to get a sociology or psychology major on the staff to regain the bleeding edge. The chat number seems to be declining again. I use that number, along with membership, to monitor the interest level.

Thanks for sharing,
SpaceFolder
5/31/11
Photo of SpaceFolder

SpaceFolder

  • 10 Posts
  • 0 Reply Likes
If I had to choose between your suggestion and the status quo, I would choose yours. I'm not saying yours is the ultimate solution. I'm saying the status quo is VERY bad. But you can't make EteRNA feel like a part-time, unpaid job. People will lose interest if you do. Maybe EteRNA needs to get a sociology or psychology major on the staff to regain the bleeding edge. The chat number seems to be declining again. I use that number, along with membership, to monitor the interest level.

Thanks for sharing,
SpaceFolder
5/31/11
Photo of Rudy Limpaecher

Rudy Limpaecher

  • 1 Post
  • 0 Reply Likes
I like the approach; however, since we cannot optimize a design by computer the only judgment is the synthesizing process. Therefore, we should not select a design by marketing of by a fancy tidal.
The review should completely blind, with the reviewer seeing only a number. Any design that is marketed, by calling up friends to select IT, should be disqualified.
RL
Photo of Sklavit

Sklavit

  • 1 Post
  • 0 Reply Likes
As some research say, one-to-one comparison IS the only one really working way to order decisions. And even for one-to-one comparison, people have a great chance of mistake, if too very different objects are compared.
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
I hope the scoring system also gets rebuilt.
10 Christmas tree's scored 94% or higher in Round 4.
16 GC pairs is not similar to 33 GC pairs.

Or else we'll only be dealing with half the issue.
Photo of Madde

Madde

  • 45 Posts
  • 17 Reply Likes
Maybe you could add a threshold of at least x% correlation to get the score of a synthesized design.
Photo of Chris Cunningham [ccccc]

Chris Cunningham [ccccc]

  • 97 Posts
  • 13 Reply Likes
Yup, this seems like the obvious solution.
Photo of aculady

aculady

  • 10 Posts
  • 6 Reply Likes
What are the criteria for determining similarity? Maybe this should be one of the things users are asked to rate: "How similar are these two designs?" When I try to determine if two designs are similar, I don't think my first impulse is to go through and count bases. I think my first impulse is to look at stack structure, to look at the points near where loops and stacks meet, and then to look at the categories of bond types and how they are being used (serially, alternately, irregularly, etc.).
Photo of iulian

iulian

  • 6 Posts
  • 0 Reply Likes
I am subscribing to the Elo method, yet i would like to propose an important addition:
It would be nice to record all the hypotheses being made during each round.

  • A hypothesis is represented by a set of user defined restrictions and a metric

  • When designing a solution, people should be able to create their own standard goals for the design. (example: no 2 consecutive Gs, minimum 4 G-C pairs, ...)

  • Users should also be able to define more complex goals (example: energy being between some values, limits the types of stacks that he can use, ..)

  • For each hypothesis the user defines the metric of the design. By using some predefined macros, he should be able to compose a comprehensive formula that will represent the metric.

  • The users vote using the Elo method on the set of rules that form the hypothesis and not on the actual design. Then only the top 10 hypotheses are selected.

  • Every design should be tested against the selected hypotheses and the best ones should be synthesized.

Photo of Chris Cunningham [ccccc]

Chris Cunningham [ccccc]

  • 97 Posts
  • 13 Reply Likes
Awesome idea. Seriously, awesome.

Of course implementation is terrifyingly difficult. For instance one of the #1 things I look for when evaluating a design is this problem:

http://s3.amazonaws.com/satisfaction-...

I'm sure other people have various things like that that they look for. I bet most "hypotheses" are, like mine, too difficult to code?

GAH I could have sworn I got an image into a reply once before but I guess you have to live with a link for now.
Photo of aldo

aldo

  • 35 Posts
  • 2 Reply Likes
If we're only going to synthesize the top eight (not counting multiple entries from the same player), shouldn't we make sure every comparison includes at least one design from the top eight at the time? All we need to know is whether a given design belongs in the top eight or not; comparing the current #20 to the current #50 seems to be a lot less useful for that purpose than comparing either of them to the current #6. This would also help with the reward problem, since making sure one design in each comparison is from the preliminary top eight increases the odds that both designs will be in the final top eight.
Photo of Fomeister

Fomeister

  • 12 Posts
  • 4 Reply Likes
Again, everyone here should be aware that one of the _key_ reasons for this project is determine how we, humans, can recognize patterns. NOT to solely apply criteria (free energy, christmas trees).
These are things that could quite simply be prevented from inclusion. BUT there is a reason that they are not.
Photo of Samuel Johnson

Samuel Johnson

  • 4 Posts
  • 2 Reply Likes
I agree that voting is too often done by popularity, guesswork or copying others. The job of studying past synthesis results and analyzing submissions is more work than most will do.

Several people have mentioned hypotheses and it seems to me that making predictions is what we are really trying to do here.

I suggest that, similar to how we have to learn and show our skill in order to be able to submit designs, we should have to learn and show our skill at predicting results in order to be able to vote.

I propose that rather than voting, we make a specific prediction for how a design will actually fold during synthesis. Just mark the design blue and yellow to match what we think the synthesis results will be.

This prediction is also a score and that score is our vote. The caveat is that our predicted score is weighted based upon how accurate our previous predictions have been.

This could be done using the current RNA Lab voting screen, but instead of voting, the viewer can select any design they want and looking at the design, mark each nucleotide blue or yellow in an attempt to match what the lab results will show when/if it is synthesized.

After synthesis, the predictions would be graded based upon the number of correct nucleotide predictions as well as how close the distance of the predicted failure is to the actual failure. The proximity of the failure location must be included to prevent someone from just randomly marking 3 failure nucleotides and beating the averages.

Then each person's scores for all the designs that they have made predictions for are averaged. Designs that are predicted for but are not synthesized are not graded. This gives incentive for people to make predictions about potentially successful designs and also prevents someone from just choosing the worst designs and labeling all the nucleotides as fail, thus padding their score.

Rather than casting an actual vote, the prediction process is the voting. Every design will have several predicted results, each one being a predicted score. Each predicted score would be weighted based upon the past accuracy rating of the person making the prediction.

Thus, the most promising designs have a high predicted score where the most historically accurate predictors have their scores weighing more heavily. Then the top scoring designs are chosen for synthesis.

Unfortunately, the only way to get a design submitted for synthesis now is just publicity. So here's my ugly and shameless self promotion.

I've combined only sections of the One Bulge Cross that successfully folded in round 4 and that can be combined without any alterations. The resulting design is made totally of tested sections.

Here's the breakdown: the right leg is from 1337 by Fabian, the top and left legs are from Donald's nephew 2 by madde, the bottom leg is from -45.5 kcal 107 M.P. by donald, and the center section is from Ding's Mod... by Ding.

All these parts folded successfully in round 4 and fit together without any modifications. This should be a very good experiment because even if they don't fold correctly when put together as a single design, it will be very informative to see why a section that folded correctly before doesn't fold again a second time.

So please vote for Ankh Will Fold! and I apologize for begging.
Thanks!
Photo of iulian

iulian

  • 6 Posts
  • 0 Reply Likes
I like your idea about predicting how a design would perform in the test tube but you spoiled a very good post with the last paragraph.
Photo of Berex NZ

Berex NZ

  • 116 Posts
  • 20 Reply Likes
Ok, I was able to grab a few details from Jee about the Elo system.
I'm sure the following could change, but I thought everyone might like to know the general gist of what is being planned. Or my grasp on it anyways.

Its scheduled to arrive around the end of the month.
When in Elo, all your decisions will be tracked and recorded.
At the end of the week, the top 8 designs will be synthesized.
Using those 8 designs, they will go back and highlight all the comparisons you did which involved BOTH of the synthesized designs.
There won't be rounds anymore, but there still will be designs sent weekly for synthesis until a winner (over 94%) is found.
Which means your designs won't be deleted every week, they will stay there till they either get synthesized or the lab ends.
Because of this, we are likely to see the design limit go from 3 to 10.
We are also likely to see the designer score go up.
And you will get 500 points per correct comparison. Which means of the comparisons you do in Elo, 28 of them will be worth points. So with voting alone, you can have a max points gain of 14,000 points per week.
Photo of Chris Cunningham [ccccc]

Chris Cunningham [ccccc]

  • 97 Posts
  • 13 Reply Likes
Make sure you don't set it up so that everyone has a point incentive to randomly vote on every pair. Perhaps +500 for each correct comparison, -500 for each incorrect comparison, down to a minimum of 0. Otherwise you will see people click every comparison randomly.
Photo of Chris Cunningham [ccccc]

Chris Cunningham [ccccc]

  • 97 Posts
  • 13 Reply Likes
Make sure you don't set it up so that everyone has a point incentive to randomly vote on every pair. Perhaps +500 for each correct comparison, -500 for each incorrect comparison, down to a minimum of 0. Otherwise you will see people click every comparison randomly.