Lab Scoring - a Mystery?

  • 1
  • Question
  • Updated 7 years ago
  • Answered
How are lab scores determined? I'm completely confused. In FMN Switch 2.0 my entry, "Enterprise Round 3", got a score of 62. I wanted to see how the highest scoring "Summers Back" (92) got 30 points more so I could hopefully learn something to improve my next entry. This is where I got totally baffled.

The folding results of "Enterprise Round 3" looked very close to the bound target! Of the 24 pairing NTs, only 4 did not match up properly, so I figured I had only a little tweaking to do and would look to "Summers Back" for some clues. I was astonished to see that the "Summers Back" folding results looked significantly different from the bound target and the pairing NTs did not line up at all!

Clearly, I have no clue how the scoring is done!! Can someone please explain?
Photo of scifun

scifun

  • 7 Posts
  • 0 Reply Likes
  • mystified.

Posted 7 years ago

  • 1
Photo of Brourd

Brourd

  • 454 Posts
  • 84 Reply Likes
Hi scifun,

The scoring for the switch lab is based on how each base shifted in color. With the normal color view, this isn't seen as easily, so it is best to use continuous colors for the experimental data, which can be turned on under game options.

If a base goes from a loop to a loop, like base number 15 in this current lab, it is supposed to switch from yellow in the first state to yellow in the second state.

If a base goes from a loop to a stack, like base numbers 20 or 44, it is supposed to switch from lighter in the first state to darker in the second state with the continuous colors.

If a base goes from a stack to a stack, like bases 10-12 or 52-54, it is supposed to switch from blue to blue.

If a base goes from a stack to a loop, like bases 13 and 16, it is supposed to switch from darker in the 1st state to lighter in the 2nd state with the continuous colors.

Now, what if a base goes from a loop to a loop, but starts out as dark blue in the 1st state, but switches to a light blue in the second state, does it still get a full point? My answer to that, I don't know. It could be that base gets a full point for switching, but is really only scored half a point because it is still not considered a "loop."

Hopefully in the future, we get a full list of bases that score full and half points, in order to make all this a little easier on ourselves :)
Photo of scifun

scifun

  • 7 Posts
  • 0 Reply Likes
Thank you for this explanation. I'm going to go back to these two designs and study them according to what you have said. Hopefully, a light will dawn!
Photo of JR

JR

  • 241 Posts
  • 21 Reply Likes
Still seems strange.
Below I record the stack to stack pair matches for 5 scores.

The switch to switch match is:
9,10,11,12-17,18,19,20-27,28,29,30 matches with
without synth:
55,54,53,52-47,46,45,44-38,37,36,35
score 92:
64,60,59,58-53,52,51,48
score 86:
64,63,59,58-53,52,51,48
score 58:
55,54,53,52-47,46,45,28-xx,44,43,xx
score 54:
55,54,53,52-47,46,45,44-38,37,36,xx
score 47:
55,45,55,53,52-47,46,45,44-38,37,36,35
The 2 top scores matches shows 6 nt slide when compared with the "without synth" positions. The lowest scores show the best match. Strange.
Is that what you really want?
I would think scores should be switched.
Makes the dot plot meaningless.
It is like buttoning your shirt off one button, saying it is 95% Ok so calling it good to go.
Also doesn't this flat model translate into the the 3 dimensional knots we see in texts that interface with DNA and such. I don't see how it would work with everything off by 6 nts.
To make sure the voting masses don't go "barking up the wrong tree" seems to me like dropping scores by 50% if the first stack to stack match doesn't pair up right.
This would get rid of designs that score high by use of a slide in nts.
Photo of scifun

scifun

  • 7 Posts
  • 0 Reply Likes
JR, I agree that the scoring still seems strange. I did go back to the 2 designs in my original example, but it still didn't make sense. Thought it was just me. Maybe not.

Your detailed analysis is most interesting. It would be nice if an EteRNA tech would weigh in here!
Photo of janelle

janelle

  • 29 Posts
  • 5 Reply Likes
I too, would appreciate a printed explanation of how the lab synthesis graded.
Photo of Eli Fisker

Eli Fisker

  • 2266 Posts
  • 509 Reply Likes
In this forum post Rhiju explains how the a specific lab puzzle gets scored, based on the raw data. The switch puzzles so far and where we need to get.

I know the picture changes with each lab on what sections are counted switching areas, and thus which areas are scored. But this should give some idea about how score is distributed.

And this post explains a little more How to read the raw switch data.

Hope this helps a little.
Photo of janelle

janelle

  • 29 Posts
  • 5 Reply Likes
Thank you so much Eli for your post, it is greatly appreciated. : )
Photo of scifun

scifun

  • 7 Posts
  • 0 Reply Likes
At this point I have read everything that was suggested to me plus anything else I could find that seemed at all related, and I still do not understand the switch lab scoring. I have done some analysis and I have the following details to explain why I still have a problem with it.

First, let me make it clear that I am not “picking on” the design Summers Back. I compared this with the Enterprise Round 3 results simply to see what the differences were between the top scoring design and mine—a difference of 30 points.

My analysis covers different aspects of lab scoring, as detailed below. I used the documents recommended, the lab results for the two designs, and the raw data for them as well.

1. Certain nts should move from paired in the unbound form to unpaired in the bound form, or vice versa. According to the raw data these nts are 9, 13, 16, 20, 22-25, 27-29, 31-33, 35-36, 39-40, 44, 48, 51, 55, and 57-60. Below are pictures of the lab results for Enterprise Round 3 and Summers Back, on which I have marked whether or not these nts made the appropriate move.

https://docs.google.com/open?id=0B9cugLeo9M5CRnhpclgtU21UU0k

https://docs.google.com/open?id=0B9cugLeo9M5CREYxY2JiQ2MxX1k

To sum up, 23 of the 26 nts made the correct move in Enterprise Round 3 (ER3), while only 7 of the 26 did in Summers Back (SB).

2. In the raw data, the bands under the nts that are supposed to move from paired to unpaired or from unpaired to paired are supposed to lighten in intensity in the direction to which they should move. Sometimes this is very clear, and sometimes it isn't, but I read that even a slight change in the right direction counts. So, I made the best judgments I could as to whether or not the band shadings moved in the right direction. I should also note that the columns don't match up perfectly, and this makes some of the comparisons more difficult. As nearly as I can tell, the shading is going in the right direction in 12 out of 26 instances in the ER3 raw data and 13 out of 26 times in the SB data, nearly equal.

3. Another measure of success is how a lab result's nt pairs match up to the target's nt pairs. Following is a listing of the pairings in the target, in ER3, and in SB.

Target pairings
9-55
10-54
11-53
12-52
17-47
18-46
19-45
20-44
27-38
28-37
29-36
30-35

ER3 pairings
9-55 ✓
10-54 ✓
11-53 ✓
12-52 ✓
13-18 x
19-47 x
20-46 x
21-45 x
22-44 x
27-38 ✓
28-37 ✓
29-36 ✓
total correct = 7 of 12

SB pairings
9-64 x
10-60 x
11-59 x
12-58 x
13-57 x
16-54 x
17-53 x
18-52 x
19-51 x
20-48 x
21-47 x
22-46 x
24-45 x
25-44 x
32-43 x
33-42 x
total correct = 0 of 16

4. There is also the matter of color changes, using the continuous colors, to be considered. I looked at this but couldn't really determine much of substance from this criterion. For the most part, the blues and yellows look to be pretty much where they should be, neither perfect nor terrible in either case. I don't feel that I have a good objective way to evaluate the ER3 or the SB results this way.

Finally, although this is not listed as a criterion and it is also not easy to apply an objective standard to, a visual comparison of each lab result shows that the ER3 result more closely resembles the target.

This is why lab scoring remains a mystery to me. I have read that it is currently being reconsidered and possibly revised. I look forward to hearing the details.
Photo of JR

JR

  • 241 Posts
  • 21 Reply Likes
From this discussion one thing I learned is matching correct pairs in the model isn't necessary for a good score, or even necessary . Apparently, scoring doesn't care how it switches, just as long as the raw data says it switches. Exact Nt to Nt
matches between the two shapes isn't necessary for this scoring system, nice if
it works, but it isn't taken into consideration when scored. So Nt right can match up with Nt wrong, show the correct switch and get the point Enough Right/Wrong matches showing the correct switch on the raw switch data sheet and you win,
along with a really ugly synth. picture.

If I am wrong with the above, please pipe in and correct me.
Photo of jandersonlee

jandersonlee

  • 555 Posts
  • 130 Reply Likes
As I understand it, the current scoring scheme is:

The scoring doesn't look at all the bases, only the ones that are expected to switch. If there is an estimated 10% change in the desired direction, (bonded->unbonded, or unbonded->bonded) that base is given full marks for switching!

Bases that are supposed to be bonded or unbonded in *both* modes are *not* scored. So it could be that a base is supposed to be in a loop (unbonded) both times, but ends up (partially) bonded in one or more states without penalty.

Even with this forgiving scoring scheme we are *still* having problems getting the designs to be scored above 95%. I'm hoping that:

a) we will soon get the ability for players to design switch puzzles, so that we can gain more experience with in-silico switches. Having access to the puzzle maker is a great help in learning how bonds and loops and stacks work. I expect that having access to a switching puzzle maker will enable similar exploration and learning for switch puzzles.

b) the cloud lab starts soon so that we can try multiple simple switch designs (of varying expected difficulties) in parallel and may be get more of an understanding of what works and what does not.