Help get this topic noticed by sharing it on Twitter, Facebook, or email.

Concern about synergy scores and evaluation strategy

According to the data description, a synergy score represents the amount of extra-effect compared to a baseline additive effect. So synergy scores around zero should represent drug combinations that do not have a synergistic/antagonistic effect on a particular cell line. However, we found that the synergy score of drug pairs without any synergy can have a wide variation. And that similar score could be associated to synergistic and non-synergistic drug pairs.



These figures have been generated using Combenefit on the raw data file, BCL2_BCL2L1.EGFR.HCC1395.Rep1.csv. From the left hand side, the figure shows (a,b) the dose response curves of the two drugs; (c) the expected additive effect of the combination calculated by the Loewe model; (d) the observed effect of the drug combination; and (e) the synergistic effect of the combination obtained by subtracting (d) to (c). From the figures (a) and (b), we clearly see that both drugs are not individually active. From the matrix of observed effect of the drug pair (d) we can also see that the combination of the two drugs do not have any effect on that cell line. The amount of cell seems to vary around 90 and 100%. But the reported synergy score for that pair is 25.22752.

Notice that using a simple threshold to filter this case will not work, as we observed drug pairs with similar score that seems to have a real synergistic effect.



These figures are generated from the file ALK_2.TOP2.A549.Rep1.csv. Figure (d) and (e) show a high synergistic effect for this drug when they are both used with 1 or 3 uM. But the synergy score reported for that pair is 24.53776.

We find a similar example for pair of drugs with antagonistic effect (file ERBB.PIK3CA_4.CAMA-1.Rep1.csv). The pair with the 6th lowest synergy score value (-54.4884) seems to be due to a problem of the estimation of the drug additive effect when each drug independently increases the cell population. From the matrix (d) we can see that both drugs increase the cell population of the cell line (the cell count varies for each drug between 112 and 119), and that using both drug lead to similar variation in drug population. But the predicted additive effect, matrix (c) is capped to 100, leading to a strong antagonistic effect (matrix (d)) and one of the lowest synergy score of the learning data set.



Those examples clearly highlight that the synergy score do not always reflect the antagonistic/synergistic effect of a drug pair and that the synergy score is affected by the intrinsic noise associated to the measure of the cell population. As previously mentioned (see http://support.sagebase.org/sagebase/...) the Pearson correlation of synergy scores between replicates is only 0.5 and the RMSE 32.03, highlighting the low robustness of this measure. It will then be hard to construct a model that can predict exact synergy scores. Thus we do not understand why the evaluation of the predictions will be based on the correlation with the synergy score (correlation that will only be computed on 2 or 3 data points for almost 100 drug pairs in the leaderboard data).
12 people have
this problem
+1
Reply
  • I’m unsure
    I understood that the synergy score is the average over all the dose-combinations so it might be so that a few very high peaks in the synergy plot will be 'masked' by the majority of zero or even antagonistic areas.
    The first example has a very flat but all positive synergy landscape but the second example has a mix of positive and negative areas. Which one is more synergistic is up to expert opinion but I guess it is difficult to quantify using a single parameter any way, IMHO.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • The Loewe model should work for cases where drugs increase growth, so the additive predicted surface should not have to be capped at 100. Is there a reason why this is the case in Combenefit?

    Also, has anyone looked into some sort of statistical smoothing to preprocess a surface before any metrics are calculated?
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Thank you for this interesting post. I am interested to know how the synergy score is calculated from the differences between observed and expected (ie Lowe additivity) cell killing. I believe that we have been told that the synergy score is an integration over the observed minus expected difference matrix, and I believe we also have been told that they are not providing the exact algorithm. (Do I understand correctly, that we don't know exactly how the synergy score is calculated?) I think these three examples show that it would be helpful to know how they are doing that integration.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • bidwbb: I got a reply that the integration is done over non-zero concentrations
    http://support.sagebase.org/sagebase/...

    However, it does not seem to be so straightforward. In all the cases above the concentrations run from x to 100x, making the area to integrate over (log(100x)-log(x))^2=log(100)^2 ~ 21. That is, the synergy score should be roughly 21x the average synergy value in the Loewe model. This is clearly not the case.
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • The score is calculated over the whole dose-response surface, it highlights some kind of average synergy as per Jing's comments. It cannot distinguish exactly between all cases. Think about it, you’re reducing 25 or so values in the dose response into 1 unique metric! There will be different distributions with the same score.

    The problem highlighted in the third case is due to the fact that in some cases, experimental issues lead to poor growth in control wells (slower growth than expected ) and then normalization show spurious overall stimulation. Because fitting is done on the basis of no effect in ctrl conditions, poor fitting is obtained and slight antagonism might result. Note that cases where this is exacerbated are flagged.

    Note that the examples highlighted here have low scores. Real synergy and/or antagonism will have higher absolute scores. Unfortunately, with only one replicate, there are indeed limits in terms of possibilities.

    Rich - we did look into smoothing, and actually there are several algorithms that have been developped and will be released in a future version of combenefit. Nevertheless, it was decided not to pre-process the data for the challenge via smoothing or other.

    Eemeli - by logarithmic space, the common logarithm is meant (log(100)=2)

    I hope this clarifies things.

    Giovanni
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • 3
    Thank you yoghi for this detailed response.

    We apologize for the late reply.

    We acknowledge that the synergy score of the first two examples are low, but they are still higher than 20, which is one of threshold used to define synergistic drug pairs in the challenge 2 evaluation. We understand that the dose response represent the average synergy of a drug pair. However, we believe that the synergy score reported for the pair of drug BCL2_CL2L1.EGFR is simply due to stochastic noise in the measure of the cell death, and that the synergy score provided for that pair should be closer to 0.

    Concerning our 3rd example, as mentioned in our initial post, it is the 6th lowest synergy score (-54) in the training data. If if we consider this is a low score, we will have only 5 reliable antagonistic pairs in the training set!

    In addition, we found that in the latest release of the training dataset, there were some “duplicate” experiments (rows that have same cell line and drug combination). Although some of them are not ‘real’ replicates because the concentrations used are different, we may expect to see a good correlation for drug pairs showing high synergy scores. The figure below shows the synergy scores of the replicates presented in the training data. The overall Pearson correlation is relatively low (0.38, which is less than the one reported in http://support.sagebase.org/sagebase/...). For the 6 drug pairs with synergy score higher than 40 (the highest value used in the challenge 2 evaluation to identify synergistic pairs) only 2 have their respective replicate with similar synergy score, and 3 have their respective replicate with a synergy score close to 0.



    As previously mentioned (see http://support.sagebase.org/sagebase/...): “there are some differences in several factors affecting the monotherapy results, but combination synergy score should be more robust”. As we understand, the differences between assays can result in different monotherapy response curves, predicted additive effects and combination therapy response curves, but the synergy score should be similar. To investigate if the difference in synergy scores between the 2 experiments on BT-20 using AKT.ERBB drug pair (the data point in orange circle), we plotted the matrices use to compute the synergy score with combenifit.



    As expected the monotherapy response curves and combination therapy response matrices are quite different between the experiments. But the difference in synergy score could not be explained by the different concentration used, as even when the concentrations are matched (submatrices highlighted by black square), we still do not see any similarity between the 2 experiments. How the difference in synergy score can be explained on this particular example?

    We acknowledge that our analysis is based on a small number of data points. Did other replicate experiments have been performed? If yes, is it possible to share them with the participant or at least provide the scatter plot of the synergy scores to see if our observation still holds?
    The low level of reproducibility between the experiments could potentially explain the low correlation reported on the leaderboard (the best method have only a global correlation of 0.23 for the challenge 1A). Do you plan to integrate the inconstancy of the synergy score on the final evaluation strategy?

    Best Regards,
    C. Suphavilai
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Hi C. Suphavilai,

    I'll have to brief here as you have listed quite a few questions and we will need time to go through them.

    We looked into this in the previous post you mentioned http://support.sagebase.org/sagebase/.... The pearson correlation between 632 assays with the same combination and cell line is 0.5. This is across all training, leaderboard and test sets, so a bit larger sample size. I understand there is still quite a lot of variability in our results, but this is similar to what has been observed in other large-scale monotherapy assays (http://www.nature.com/nature/journal/...).

    We will definitely take into consideration the level of "noise" in the synergy scores as we will compare prediction performance to random predictors. Please remember, we are not just interested in prediction accuracy, but also whether your predictive models are interpretable and can yield a potential genetic biomarkers.

    Best,
    Dennis
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned

  • Hi Chayaporn,

    In your REP1 vs. REP2 plots there seems to be a good correlation for high synergistic scores, except for one outlier for which you're showing the respective graphical analysis obtained via Combenefit.

    I cannot comment on the reproducibility, and obviously there are not enough cases here to judge on that. But looking at the combinations dose responses, it is clear that some enhancement (a dip in the surface) is present in one case and not in the other. You can also see that some of the difference in score is due to the difference in the concentrations range (due to the nature of the metric used here).

    In the attached plots, only the common concentration points have been kept, and the synergy distribution has been mapped to the 3D dose-response for better interpretation. So, stricly speaking, the different score is a direct consequence of each replicate experimental result.

    Typically, in a lab, all replicates will be considered for the final score. This cannot be done here but I hope that these explanations can help in interpreting the data.

    Best,
    Giovanni
  • (some HTML allowed)
    How does this make you feel?
    Add Image
    I'm

    e.g. sad, anxious, confused, frustrated happy, confident, thankful, excited kidding, amused, unsure, silly indifferent, undecided, unconcerned