/ Open Targets Platform

From antibody to zinc finger: an A-Z of Open Targets

OpenTargets or Open Targets? Target validation or target identification? Celiac disease or coeliac disease? As users discover our resources, take a closer look at the lexicon of terms that matter to us.

Following Halloween, the latest entry in our A-Z collection of blog posts is H.

Harmonic

If you have ever wondered how we compute the association scores in the Open Targets Platform, the chances are you would have come across this word before. Perhaps in our FAQs, How do you score the associations?

Harmonic. More specifically, harmonic function, or harmonic sum.

Wikipedia defines this as a “progression formed by taking the reciprocals of an arithmetic progression”.

It looks like this:

S1 + S2 / 22 + S3 / 32 + S4 / 42 + Si /i2

Where S1, S2,..., Si are the individual evidence scores sorted by descending score i.e. from higher to lower values.

We use the harmonic sum everytime we aggregate evidence to score target-disease associations at the data source and data type levels, and also when we compute the overall association score.

How does this work in practice?

Let's pick Ustekinumab, an antibody we use as drug evidence to associate IL12B with Crohn's disease. We have several phase III trials where this drug is currently investigated for the modulation of IL12B in Crohn's patients.

Screen-Shot-2019-11-08-at-18.22.07

Want to get this information programmatically? Easy peasy: use the public/evidence/filter endpoint.

Having these six different trials is surely a good thing: the redundancy (or replication) here would add stronger support to the IL12B-Crohn's disease association and we would be pretty confident on that link.

But at times too much of a good thing (too much replication, too many different trials) will skew the associations for targets and diseases that have a lot of data. Targets that are widely studied, for example, will have more evidence and higher association score than those that are less so.

How can we strike the balance, accounting for replication within data source or between data types without skewing the analysis?

The answer is with the harmonic function: the repetition of evidence counts will approach a plateau at some point, and more added evidence will not have a huge impact on the score. So, even if we add extra trials, the final score will not be drastically affected.

score_beer_pint

Pints of beer can be useful as a visualisation for how the harmonic is applied to aggregate scores from different data sources, giving rise to the overall association score. Image credit: Andrea Pierleoni.

Another advantage of the harmonic function is to deflate the effect of large amounts of data, e.g. evidence from text mining and expression data.

Where can you go for more information?

Head to the Supplementary data in the most recent Open Targets Platform paper for more details or check the scoring script in GitHub.

And for those of you out there who love your maths, the publications by Nils T. Hagen and Insuk Lee and colleagues would be a good read on the nitty-gritty of the harmonic function.

Now that you know what the harmonic is and why we use it, we have a word of caution. When prioritising target-disease associations with the score, use it wisely.

The association score can be used to:

  • Rank target-disease associations
  • Show how confident we are in the association
  • Help you to design your null hypothesis
  • Help you to decide which target to pursue

The score will NOT be sufficient on its own. Use it in combination with other key attributes at the target level, for instance, tractability and safety information.

If in doubt, get in touch and we will be glad to help.

Denise Carvalho-Silva

Denise Carvalho-Silva

Supporting drug discovery scientists, following post-doctorate research at the Australian National University and the Wellcome Sanger Institute, and working at Ensembl and GENCODE.

Read More