Help get this topic noticed by sharing it on Twitter, Facebook, or email.

Results of the DREAM olfaction prediction challenge are out!

Reply
  • Thanks organizers and congratulations to everyone who did well in the final results. I have a few questions that I hope will aid in subsequent collaboration:

    @Organizers: Would it be possible to get a breakdown of final scores by percept as done on the leaderboards? This will help us determine to what degree different approaches make sense for each.

    @Clara/Yuanfang Guan: Would you be willing to share your feature set b, which are indicators of quadruplets? (I may have missed it, but do not see these data anywhere in your Synapse folders.) If I am understanding your description correctly, this is a function of the row number of the molecular descriptors, which raises a question about generalizability to future chemicals.

    @IKW AllStars/Rick Gerkin: How exactly did you do your cross-validation to optimize your forest models and how did your CV results compare to your results on the leaderboards? I'd like to try to replicate some of your findings and explore ensembling your best forest models with some of the best models I found like ridge regression. And, just for fun, what does "IKW" stand for?

    Looking forward to further discussions about this project!

    Russ
    • view 1 more comment
    • Thanks Rick and congratulations on some great work!

      - I'd like to try shadowing your approach and cross-validate your best forest models with my own SAS-based code and compare with some other models using the same combined training matrix. Was just looking at your ipynb and may have missed it, but do not see how you performed CV, e.g. did you do 5-fold or 10-fold? Did you stratify the target variable in the hold-out sets or were they simple random splits?

      - My current thinking is that a half-ellipse function is slightly better than a parabola for the stddevs.

      - Agree on valence/pleasantness; I found it and it's stddev to be the most challenging targets.

      - I fit quite a few PLS models and they did well sometimes in terms of CV performance, but typically not quite as good as kernel-based methods like ridge regression. Only a few PLS models made it into my final ensembles (which usually combined 3-6 models). I only included a few basic default forest models so would really like to see if the optimized ones you found will boost peformance.
    • Russ,

      -- There is an rfc_cv function in my fit1.py and fit2.py files. That has the default values for my cv splits. Those are overridden in my exploration.ipynb notebook, although that notebook may be incomplete and represents a subset of random data exploration things I did. The splits were random.

      -- Half-ellipse might be right. I found that std ~ a*mean^0.5 + b*mean^2 was better than std ~ a*mean^1 + b*mean^2. When I worked it out theoretically, however, it seems like the quadratic should be better. Basic non-stationary fluctuation analysis result, e.g. in Hille's "Ion Channels of Excitable Membranes", which exploits mean-variance relationships of bounded experimental variables to extract underlying parameters about ion channels.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Hi, Russ,

    both of my feature A set and B sets are included in my single folder.

    this feature set can be generalized.

    but you mis-understood it; or more likely, my description was not clear. for example: this is my set b: a chemical having the name propane, will have the following features labeled as 1: prop; rope, opan, pane. of course, a chemical with longer names will have more features as 1. such names were included in the training set.

    it will be easily applied to any chemicals; as ALL chemicals existing in pubmed have names.

    i believe my submission was the best of what could be get out of this training data. tuning may result in slightly better performance out of chance. but to substantially improve over this, one must need external data/software.

    thank you and thank all participants for participating.

    Yuanfang guan

    http://guanlab.ccmb.med.umich.edu

    http://a2genomics.com

    update: my 2 cents: i think what made the 2 standard deviation between me and the second place is the 0.2+0.8 operation between global and individual. because my submission didn't have much advantage in sub 2 for the average predictions. i think the top 4 in sub 2 are very similar, for less than 0.1 standard deviation difference; number 5 might be slightly further away. but i am sure organizers will give a better answer on this question from a more statistical/scientific point of view.

    thanks
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Thank you Yuanfang for the clarification on your feature set b--I understand it now as a kind of semantic fingerprint and this is indeed generalizable and clever! I'd like to give it a try. If I am understanding your description correctly the prediction from this feature set comprises half of your full prediction, which is a uniquely distinguishable characteristic of your approach.

    I fully agree with the strategy of shrinking the individual predictions towards the mean values--this is a classical approach to reducing mean squared prediction error which is closely related to Pearson correlation. It would be good to learn how you arrived at the 0.2 and 0.8 weights and also more details on how you implemented your decision trees.

    Interesting that both you and Rick Gerkin stuck strictly with tree methods--he appears to have grown some thick random forests.

    Congratulations!
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated indifferent, undecided, unconcerned

  • Thank you to the organizers and the other competitors. I think many of us had similarly strong submissions. Since we used different approaches, I hope we can continue to improve further with the best insights from each team.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated indifferent, undecided, unconcerned

  • Dear participants be advised you can continue to use the leaderboards to further improve your models

    thanks for the interest

    Pablo
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. happy, confident, thankful, excited kidding, amused, unsure, silly sad, anxious, confused, frustrated indifferent, undecided, unconcerned