Looking for Player Input on Current Lab Round (R107)

  • 2
  • Question
  • Updated 3 years ago
The current lab round, released early this week, has gotten off with an enormous bang!  As I write this, there are already 2288 submissions, which is almost 2/3 of the allocated slots.  Yet there are still over 5 weeks before the scheduled cutoff date, and many dedicated players have not even gotten started yet.  It looks like this will be the first round since Johan introduced his lab-on-a-chip experiments that players will once again be competing for synthesis slots. 

This is tremendously exciting for the future of Eterna.  But it does raise the question of how we should best allocate synthesis slots for this round.  We discussed various aspects in the dev meeting today, but so many interrelated issues were raised that we decided to get input from more players before making any decisions.

Let me start with some background information:
  • The puzzles in this round are intended to gather data for one or more scientific papers. So the scientists have been very actively involved in creating new puzzles they would like to get data on.
  • The plan has been to release a second project this week that has 16 new puzzles that use an RNA reporter as output, instead of the MS2 aptamer used in the currently active project.
  • 7200 synthesis slots have been allocated for player submissions for this round.  This is fewer than typical because more have been reserved for designs generated by research bots. 
  • The existing puzzles were posted with a limit of 50 submissions per player.  This morning, that was reduced to 30.  But some players have already submitted more than that.
The issues that were raised and discussed in the dev meeting included
  • Is 28 puzzles in one round spreading players' efforts too thin?
  • Is 6 weeks too long for a round, given the current submission rate?
  • Assuming that there will be more submissions than available synthesis slots, how should they be allocated? (For the benefit of newer players, the techniques we used to use were user voting combined with brute force trimming as needed.  In the early days, when no more than 10-20 designs could be synthesized for each puzzle, each player could vote for up to three designs, and the designs that got the most votes were the ones synthesized. As the number of designs per puzzle grew, there were more more synthesis slots than designs getting even a single vote, and an algorithm would be applied after the round ended to decided the maximum number of entries allowed per player, and truncate the submission list for those players who had submitted more than that.)  If we restore voting as the means for selection, does it need tweaking, given the huge number of designs we have now?  Would it help to restore incentives, in the form of points, for voting?  If we do that, should we restore points for submissions, as well?  And if we do that, should we retroactively award points for all the labs that haven't been rewarded for?  And if we want to make any changes, do we have the developer resources to actually make it happen? :-)
There were probably even more ideas thrown out, but this should be enough to convey the scope of discussion.  The objective at hand is to decide how we can best optimize players' efforts for this round, given the constraints on developer resources and experimental requirements.

The floor is now open for comments. :-)
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes

Posted 3 years ago

  • 2
Photo of LFP6

LFP6, Player Developer

  • 613 Posts
  • 109 Reply Likes
I'd say voting on thousands of solutions is unreasonable. I'd love to see bundles implemented, so there is some way to present specific research goals, though I do think having a couple guaranteed slots for individuals would be really beneficial (ie, for new players). The ELO rating system Eterna used at one point would be cool to help with getting players to vote and being more effective (especially with point incentive), but there is the issue of how to teach players which ones to pick (many new players probably don't have any idea which ones might work over others, and even then there can be benefit to failing designs, as has been brought up in the discussion of that when it was brought up).

Providing points for lab submissions is something I think many have always wanted to see come back (both retroactively and for future puzzles I'd think). Of course, I think there should be larger measures for revamping points, but I've gone into that at length before (though my most recent thoughts are only on the player-led development Slack, not on the forum, should probably post them at some point). Points for voting is also a good idea if you want to get people to vote.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
So that explains my sudden limit to 30-31 submissions. Rats, I was just getting going.

I often thought it was first come first served. But that doesn't allow for "better players" to take their time if "lesser players" are eager and early.

Perhaps and algorithm to rank the utility of each players total submissions probability of being of highest quality and then pro-rating the slots on that basis? Someone who very often submits 95% or better scores may have more general utility than me and so might warrant more slots whenever they are scarce.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
Hi @eternacac.  Good to have you active again!

It is true that more experienced players will often ask for a personal bump in their quota when slots would otherwise go unused. But in my experience, those same players are the ones who are most vocal about finding ways to get more players more involved, not increasing their own allotments.  I think the more experience one gets, the easier it is to appreciate the value of diversity in the submissions.
Photo of quantropy

quantropy

  • 7 Posts
  • 4 Reply Likes
I think that an important question is what is the time limiting factor in this process. I can think of several candidates

1: Lab equipment time: Experiments can only be performed at a given rate
2: Academic time.  The purpose is to provide data for a scientific paper, and theres a limit to how quickly such papers can be written
3: Administration time.  Keeping Eterna going involves quite a bit of work, and it would be hard to do this any faster.
4: Player time: Players need 6 weeks to provide the best entries.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
Yes, these are all very relevant.  The first three are particularly relevant, but beyond players' control.  It's really the last one that we as players can have control over.

I think we can take it as a given that the six weeks is when we need to have all designs in hand for the next synthesis round.  And you are probably right that most players would prefer just to leave all the projects open until then.  But, for example, we could instead hold off releasing the new project and do the projects sequentially. The advantage would be that players could be working more concentratedly on a smaller set of puzzles at any one time.  We're always looking for ways to facilitate collaboration between players, and that might have a positive impact in that regard.
Photo of Pi

Pi

  • 19 Posts
  • 2 Reply Likes
I don't really understand how the voting idea is supposed to work. We don't know what will and won't be a good design, right? I mean if there is a metric to determine a better design, then it should be implemented in the game itself. Or is there a metric used, but already brute-forced by research bots and so players are only supposed to come with innovative designs that are not covered in that metric?

I don't see a point in accepting new designs when you reach the maximum number of submissions you can handle (If there is no way to auto evaluate and sort them). Just close the project or redefine it. In my opinion there should be an extremely hard lab project that is always opened when there is no other project to do... So that players can have fun solving it and there isn't a two months' gap with constant countdowns.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
@Astromon, @Pi, @LFP6  I had seen my proposal as being very favorable to designers (even though they would have to "share" the reward with their supporters) because for all their designs that didn't get any votes, they would get all the reward. But as your examples comments out, things would be more transparent if we just treated designing awards and voting awards as being distinct.

As for sharing a "pot" for each design equally among those who voted for it, it's analogous to parimutuel betting.  Previously, voting was very subject to the snowball effect because many players would not by looking at the designs, but voting for the same designs that others had, with the presumption that the previous voters actually knew what they were doing.  Not surprisingly, there was not a very high correlation between votes and scores.  With parimutuel betting, following the crowd is counter-productive for the individual.
Photo of Astromon

Astromon

  • 192 Posts
  • 26 Reply Likes
As of now and the last 7 or 8 labs I have participated in there is no points for designers or voters so any point system put in is a plus. That said I think designers should get points for good designs . (so many for an 80, 90, 100) the voters can get points for voting for a good design but it shouldn't effect the points that the designer gets. (and would be much less than the designing players points)  Is that not doable ?  Thanks!
(Edited)
Photo of Pi

Pi

  • 19 Posts
  • 2 Reply Likes
@Omei Turnbull You could also make the voting private to avoid the snowball effect. If you don't show casted votes at all during the voting process then you'll know for sure that they don't affect other votes. And you can reveal them again when the project closes.
Photo of Brourd

Brourd

  • 452 Posts
  • 82 Reply Likes
@Omei I don't understand how your proposal would encourage players to look at designs to distinguish good ones from bad ones. If a player is only given 40% of their total slots as votes, that would mean that for the hundreds of designs submitted for each target, a player would have, given 30 design slots, 12 votes. If there is a hypothetical 300 solutions submitted for a lab puzzle, their vote would only account for 4% of all submitted sequences. Given the average score for most sequences in puzzles like these have typically been between 60 and 75, as @Pi stated above, as a player there is about as much of an incentive to comb over the details of the thermodynamic, kinetic, chemical, and structural characteristics of a sequence to ascertain whether or not it's a good switch as there would be to randomly place my votes using a random number generator. This would also detract from the appeal of game playing given the serious time investment required to do any of these things. One of the most important aspects of games that require significant time investments is a feeling of progression, whether that progression is an artificial inflation of the player's avatar using a change to the status or due to a gain in skill from better understanding the mechanics of the game.

Hence, I would agree with @LFP6 that an ELO system for comparing sequences would honestly be a better system, but you run into another problem. What's a good switch? How many players, when given two sequences, are likely to pick the "better" sequence of the two? If my memory serves me right, the Eternabot for switches hasn't done particularly well based on the strategies players proposed has had the same success as the original, which may be indicative of a more widespread issue.
(Edited)
Photo of Brourd

Brourd

  • 452 Posts
  • 82 Reply Likes
I would probably say that there needs to be a place where players can "train" in order to better identify good switches, before any change to the voting system.
Photo of Astromon

Astromon

  • 192 Posts
  • 26 Reply Likes
I want all my designs to be tested. And to be honest having to think some may not has unmotivated me to even continue to design for the lab.
Photo of Astromon

Astromon

  • 192 Posts
  • 26 Reply Likes
(great idea PI!) Hi lfp6!
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
@astromon In the recent past, the dev team has tried to set the individual quotas so that 1) all submissions got synthesized, and 2) we made good use of the synthesis capacity.  (There isn't a hard limit on how distinct designs can be measured, but the more designs synthesized, the less precise the data is on each design.)  But there's no way to know ahead of time how many players will be participating, and how many designs they will feel like creating.  So the admins for different rounds have taken different approaches.  For the one lab where Ely and I were the admins, we purposely set the initial quotas low, and raised them gradually on a puzzle-by-puzzle basis as players asked for them.  But that took more active attention than making an informed guess at the beginning and living with the consequences.

In this particular round, there were originally fewer puzzles planned, and the informed guess for the quota was going to be 100.  But as more puzzles got added, the guess was lowered first to 50 and then to 30.  Unfortunately, there was ambiguity about who was going to make the last change, and nobody noticed right away that it been left at 50.

For the few times (in the recent past) that there were just too many submitted designs to get good data on all of them, the selection algorithm was to lower the puzzle quota (after the round was closed) as much as it needed to be in order to "fit" the available slots.  Any design that had at least one vote would not be cut.  All other cuts were designs by players who had exceeded the ex-post-facto quota for that puzzle.

We don't have to follow this same algorithm going forward; that's (one aspect of) what this discussion is about.  But in discussing it, we probably do need to distinguish between what we can do in the time frame for this round and where we would like to develop for the future.
Photo of Astromon

Astromon

  • 192 Posts
  • 26 Reply Likes
"  All other cuts were designs by players who had exceeded the ex-post-facto quota for that puzzle" i like this algorithm;
Also i lke your thinking on making voting a game in itself and would help new players and others learn what does make a design look good. Thanks for your answers.
Photo of Pi

Pi

  • 19 Posts
  • 2 Reply Likes
@Omei Turnbull Oh well. As somebody with 420 designs submitted should I delete 60 of them now, or wait for the lab to fill? I certainly don't want them to be randomly chosen. Do you notify relevant players before this cut so they can react? I mean when I started I had no idea that there were other engines available in the lab so my first designs were Vienna only and so I wouldn't feel that bad deleting these.

Btw is the maximum number of slots 3600 or 7200? The first number is advertised in the lab page and the second one on the home page. Maybe these numbers should be somewhat more unified.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
@Pi I would say that if you want to replace some of your early designs with better ones, go ahead and start doing that now.  But otherwise,  you might as well wait until things are more settled.  For example, @Brourd asked if it would be possible to limit this lab round to just the puzzles that are currently available, thus doubling the number of available slots for these puzzles.  I don't think that's feasible, but I'm not the one who would make that decision, so I have brought the question to Rhiju's attention.

As for the numbers, the current plan is to release another set of puzzles (i.e. a second project) that also has an allocation of 3600 slots.  Together, the two projects would have 7200 slots in this lab round.
Photo of Brourd

Brourd

  • 452 Posts
  • 82 Reply Likes
Would it be possible to split this single round of experimental targets into two individual sequencing runs. It may also be useful to end this first round in about two weeks to account for the additional time necessary to run the experiments and analyze the sequencing data. This would allow for a greater number of sequences per player in each round, while maintaining a schedule similar to the current plan.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
Having a faster turnaround time on returning results to players would certainly be nice!  But I think decreasing the time between synthesis rounds probably isn't realistic right now. A lot of effort and expenses go into each experimental round, and the effort is (to a large extent) independent of how many designs are being processed.

But really, @johan or @rhiju would be better a better source than I for addressing this.
Photo of LFP6

LFP6, Player Developer

  • 613 Posts
  • 109 Reply Likes
I think he was saying ending one early so that two rounds aren't being synthesized at once/reducing the backlog (correct me if I'm wrong here).
(Edited)
Photo of Brourd

Brourd

  • 452 Posts
  • 82 Reply Likes
LFP6 is correct. It's not so much about increasing the turnaround time, but just running the MS@ switches at an earlier time, given there are currently no Eterna rounds in the experimental pipeline afaik. If the next time opening for the sequencer isn't until the end of this round, or if the budget or work required doesn't allow for it, then that's a different set of circumstances.
Photo of johana

johana, Researcher

  • 96 Posts
  • 45 Reply Likes
Thanks for all the comments. We are really excited about the current round since we hope to demonstrate that the great results for the fmn/ms2 switches can be generalized to any combination of aptamer and reporter. That is the reason for the late number of puzzles. In the previous round we noticed that same state switches seemed easier and that placing the reporter between the aptamer arms was better. We want to test this using different puzzles with the same aptamer. I was very impressed by the large number of solutions already. The limiting factor for experiments is both cost and time and separating the round into two experiments doubles both.
We are trying to gauge the excitement among the players so we can better match the experiments with the design. Would more slots fill up, similarly to the fantastic results so far, such that adding more capacity would be welcomed by players?
Photo of whbob

whbob

  • 193 Posts
  • 58 Reply Likes
The players have the ability to supply solutions (as they are doing in R107).
I think their desire to play and your desire for puzzle data equals  add more capacity.  
From a practical standpoint though, some tasks may need small capacity and some large.
With the first OpenTB R104, the easy A/C & B/C puzzles, scores of 80-100% came after about 400 to 500 submissions.  R105(Round 2) had high scores in the first 20 submissions. R106(Round 3) high scores for A/C came in the first 20 submissions and B/C came in at about the 100th submission.
I think that feedback matters a lot. First rounds may need to be large just to cover as much diversity as possible.  With feedback, each additional round may need less capacity because bad designs are weeded out early.  

Would more capacity mean just more slots or longer sequences?  My guess is just more slots.
 
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
Re more slots or longer sequences, there are chips that could do either or both.  But Johan was asking about increasing slots for the current round; the sequence length won't be changing. 
Photo of Astromon

Astromon

  • 192 Posts
  • 26 Reply Likes
I think the extra sub-labs will be welcomed and filled within the six weeks allowed. If not a couple extra weeks can be added. By all means add them as soon as possible, it will be a good test within itself to see if 26 labs will fly!
Photo of Astromon

Astromon

  • 192 Posts
  • 26 Reply Likes
 Considering the cost and time saving factor I think this can be done. Thanks!
Photo of whbob

whbob

  • 193 Posts
  • 58 Reply Likes
It's sad to think that players creativity might be randomly reduced by eliminating their submissions. The fact that players can supply way more than the lab can consume is a credit to the players to be sure. Maybe letting the current 12 + 16 puzzles play out with a max of 50 submissions for the first 12 and making 30 submissions for the second, closing the lab when 7000 or whatever is the lab max limit would be the way to go.  There are about 20 players or less in the puzzles now.
I'm still trying to sort out R104,5 and 6, so I'm not very active in R107.  For me, the time between the Rounds is not enough to get and analyze the spreadsheets, fusion tables graphs and 2D structures. I don't mind playing in the fast lane, but when I come to the fork in the road, I'd like to have some feedback as to which way to turn :)  
Photo of LFP6

LFP6, Player Developer

  • 613 Posts
  • 109 Reply Likes
Just want to put it out there that I'd prefer even less time between rounds - at the very least, there should always be SOMETHING for players just finishing the tutorials to work on.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
@whbob To be sure, it's sad to see players' creativity reduced, either by cutting their submissions or by not letting them submit their designs in the first place.  This really gets to the question that Johan asked above.  Do we think there will continue to be enough player involvement at this point in Eterna's evolution to justify increasing the number of slots per lab?  (Basically, by buy bigger chips.) If you have more thoughts on that, it's probably best to write them as a comment to Johan's post, where they will be certain to catch his eye.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
@LFP6 That's always the goal.  For something like the final round of OpenTB, it should be pretty easy to achieve because the puzzles are already defined.  It was uniquely hard for the current round because Johan and Nando had to sift through the literature to figure out what aptamers for multiple small molecules would make for appropriate puzzles, meaning they had to satisfy diverse constraints in both the game and the lab experiment.
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
@rhiju reminded me about another idea, one that Will Greenleaf had at the last dev meeting, that would be an alternative to bringing back some form of voting. That idea is to ask a player, when they are submitting a design, to give an estimate of what its Eterna score will be.  The reward for the design would then take into account not just the score, but also how close the prediction was.

If you're wondering why this would be an alternative to voting, it is because we are not looking at bringing back voting primarily as a way of influencing what designs are synthesized, but as a way for seeing (and improving) how well a players understand what makes a good switch.

What do you think about that idea?
Photo of Astromon

Astromon

  • 192 Posts
  • 26 Reply Likes
yeah, it makes more sense to do it in the second stage of each round after results. The first stage of a round it would be more of a guessing thing I would think. However you guys decide to do it is fine though!
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
In this case, I doubt it even could be in this first round because the devs couldn't get it ready that quickly.  But I certainly expect experienced players will do much better with these new ligands than they did in the first round of the FMN switch, because collectively we have learned quite a bit about RNA switches. At least some of that knowledge, and hopefully quite a lot, will generalize to these novel ligands.
Photo of Astromon

Astromon

  • 192 Posts
  • 26 Reply Likes
Oh, are these the same puzzles because i did okay on the fith round 101 of fmn riboswitches>   http://prntscr.com/e0irqi

I think round two was my first lab I just missed the first one I thought these seemed familiar!  i have made all new designs so it will be interesting to see how these scores compared to them! I will go ahead and guess my high score will be better than before. 89-99ish  (:
Photo of Omei Turnbull

Omei Turnbull, Player Developer

  • 980 Posts
  • 308 Reply Likes
The similarity between the current puzzles and round 101 are that they use the formation of the MS2 hairpin as the output, as opposed to the reporter RNA in the OpenTB puzzles.  

(When we are using MS2 as the output indicator, what happens is that the hairpin tertiary structure, when it forms) binds to a moderately large MS2 protein, which in turn is covalently bound to a fluorescent molecule.)
Photo of Astromon

Astromon

  • 192 Posts
  • 26 Reply Likes
Thanks for this information! So these use ms2 as oppose to the reporter in TB for output indication. Interesting, Thanks
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
Filling slots...is a pre-synthesis task with the goal of improving synthesis results. An objective measure would be desirable. A proven correlation to better synthesis scores would also be desirable. Finally it must be immediately available to all players from within the game presentation itself, ie. something the game itself shows you, not something calculated off-game or by a script that not everyone has running.

So, what does the game show me that I can use? In this round we have 5 green or red boxes in common to each puzzle that I use: state1 & reporter, state2 & reporter, state 1, fixed site state1, fixed site state2. In addition box % AU is available. There are currently 3 solving engines, Vienna, Vienna2 and Nupack, so a total of 15 boxes to score, excluding the % AU box reserved for ties or other discrimination. Penalty box 4 nt's in a row also excluded or used for exceptions.

A perfect score would be 15 green boxes, i.e., all 3 solving engines agree. (Some current solves meet this criterion.) More often 2 of 3 engines agree and some portion of the 3rd's boxes are green. So, a max score of 15 boxes is available. Often only 1 engine solves and sometimes none. (Thinking outside these boxes is penalized at this point which may be resolved later.)

Count first the solving engines, 3, 2, or 1. First pass rank 3>2>1.
Next count total boxes green. Second pass rank 15>14>13...etc.
Next rank boxes (and errors) on relevance to problem. Bound molecule in state2
with fixed state is better than bound molecule and failed fixed state. (Measures switchiness of fixed state.) We want our solutions to switch and bind the molecule. Partial fixed/unfixed sites can be evaluated by correct Nt % highlighted.

Next, did the reporter form/not form in state2? We may bind the molecule and switch the fixed site but get no indicator. Same for state1. Does the reporter site switch?

Rank reporter site2 > reporter site1 or vice versa as applies to puzzle.

Assign points and totals. Decide on energy use, does |-E1|>|-E2| really matter or is it also a tie breaker?
Assign points and score.

This is how I am currently evaluating designs with additional consideration to how many nt's need to change in a design to make it work in missing solving engines. If only one engine is missing and only one nt needs to change then that is a very near miss! I also consider the secondary shapes as displayed by the game, giving 6 total points to perfect agreement in both states. So, max score 6, min score 0.(This has actually happened, in my recent submissions.) Finally I look at the dotplots and see if they mostly agree on the major structure probabilities and they often do agree (even Nupack).

Someone should implement code for such an objective method and then correlate to synthesis scores, especially of winners to gauge usefulness.

Still it would be an objective scoring method for slot assignment.

A fuzzy logic representation of above algorithm would be helpful.
Photo of eternacac

eternacac

  • 274 Posts
  • 19 Reply Likes
An example of my current round XTheo B #30 as a guide picture of the scoring idea above. I would score it a perfect score. YMMV. ;~)

Photo of Astromon

Astromon

  • 192 Posts
  • 26 Reply Likes
Thanks for the detailed process you incorporate into your designs!
it sounds like a great system! I have heard non-switching static stems in designs are a good idea so i try to put them into my designs W/ the hashtag #non-switching-area.