What is new in the latest release — 17.07 — of the Open Targets Platform?

The latest version of our Open Targets Platform is out.

We have more diseases, more evidence, more associations, more data sources, but fewer targets. If you wonder what and why, keep reading.

Let's have a look at the exact figures first, then introduce the new data sources and explain the reduced number of targets:

targets diseases evidence associations
26,122 9,150 5,347,817 2,857,732
Hello PheWAS and England PanelApp, and welcome

We now have two extra data sources added to our toolkit of genetic associations: PheWAS Catalog and Genomics England PanelApp.

PheWAS Catalog

The PheWAS resources provide a catalog of Phenome-wide association studies (PheWAS) where a genetic variant (e.g. rs17513961) is associated with multiple phenotypes. The clinical phenotypes are derived from the electronic medical record (EMR)-linked DNA biobank BioVU at the Vanderbilt University Medical Center. We have mapped the ICD-9 codes to EFO using the EMBL-EBI lookup and mapping tools, OLS and Zooma, respectively.

By including the PheWAS data, we can now provide genetic evidence for associations that were previously based on text mining and/or animal models only, therefore strengthening the association score.

The evidence for MC1R in sensory system disease exemplifies this well. We can now support the genetic association with rs4785763, in addition to the previously available mouse data.

Genomics England PanelApp

The Genomics England PanelApp is a knowledgebase that combines crowdsourcing of expertise with curation to provide gene-disease relationships to aid the clinical interpretation of genomes within the 100,000 Genomes Project. This information is curated from the following sources:

  • Radboud University Medical Center
  • Illumina TruGenome Predisposition Screen
  • Emory Genetic Laboratory
  • UK Genetic Testing Network
  • Published and peer-reviewed articles
  • Expert lists
  • OMIM

The genes curated from at least three of these sources are genes of the highest level of confidence in the PanelApp and classified as ‘green genes’. We now have information on this subset of high confidence genes in addition to their phenotypes mapped to EFO. This is used for our genetic associations.

Target Enabling Package

We now include links to the Target Enabling Packages (or TEPs) programme, which is provided by Structural Genomics Consortium and whenever a TEP is available, it can be viewed in the profile page of a target, such as CDK12:

The TEPs allow rapid exploration and characterisation of proteins with genetic linkage to key disease areas by providing some of the following:

  • Protein production methods,
  • Biochemical and biophysical assays for activity, affinity, etc,
  • Protein structure
  • An antibody or nanobody
  • Cell-based assay
  • CRISPR knockout
Goodbye cell lines with no disease as a study factor

The differential expression data from Expression Atlas can include studies performed on human cell lines, which don't involve a disease vs normal comparison, rather experiments such as treated with drug X vs untreated in breast cancer and mutant for gene X vs wild type for gene X in type II diabetes.

For studies like these when the disease is not a study factor a low confidence level is assigned, and the target activity is classified as unknown, instead of either decreased or increased.

We have started removing these low-confidence experiments from our analyses to strengthen the associations based on the differential expression evidence. This has lead to a decrease on the number of targets.

You may still encounter examples of `unknown' activity such as for the association between PTEN and the phenotype thrombocytopenia.

We will continue to refine this data for our future releases.

Other updates

We now include variants with other clinical significance in addition to the pathogenic variants available in our previous releases:

  • Drug response
  • Association
  • Risk factor
  • Protective
  • Affects

These are a subset of the options for clinical significance used by ClinVar.

We have added around 18,000 variants from the GWAS catalog for the following targeted array studies:

This release also include some web updates. Check our Release notes for more.

Any questions or suggestions?
We are all ears and would love to hear from you.

Get in touch by email, via Twitter, Facebook or LinkedIn.

And by the way, if you want to replicate the hero image of this post, this is Gautier Koscielny's recipe:

  • Parse the evidence strings and filter on PheWAS
  • Extract the disease term, p-value and odds ratio information
  • Map the disease to a therapeutic area
  • Plot this using your favourite viz tool, such as Spotfire