/ Q&A

Prioritising your list of potential drug targets with tractability assessments

This blog is a joint contribution by Melanie Schneider (ChEMBL) and Andrew Hercules (Open Targets at EMBL-EBI). You can contact Melanie and her colleagues at chembl-help@ebi.ac.uk.

When identifying and prioritising potential drug targets, one of the most important considerations is the tractability of a given target, or the confidence that we can identify a modulator that interacts with the target to elicit a desired biological effect.

In our 18.10 release, we included tractability assessments for small molecule and antibody modalities, which were generated by one of Open Targets’ informatics projects.

Recently, we sat down for a Q&A with Melanie Schneider of the ChEMBL team to learn more about the pipeline, the changes that users will see in our upcoming release, and ongoing work that will be released later this year.

1. Hi Melanie! Thank you for joining us for a Q&A for the Open Targets blog. Can you start by telling us a little bit about yourself?

I am a Protein Computational Scientist with the ChEMBL team here at the European Bioinformatics Institute (EMBL-EBI). I joined the team in January 2020 after completing a PhD in structural bioinformatics at the University of Montpellier. During my PhD, I was based in the Centre de Biochimie Structurale de Montpellier and I used virtual screening, machine learning, and the incorporation of protein flexibility and quantification of binding affinities for computer-aided drug design. Prior to commencing my PhD, I worked in research roles for various organisations, including DKFZ - German Cancer Research Centre, EMBL in Heidelberg, and the Max Planck Institute of Molecular Cell Biology and Genetics in Dresden. I also completed an MSc in molecular bioengineering and a BSc in integrated life sciences.

Outside of work, I like to dance (ballroom and latin) and participate in other sports like running and badminton. I also enjoy cultural activities and taking part in scientific outreach events, such as “Pint of Science”.

2. What sparked your interest in the tractability project with Open Targets?

Towards the end of my PhD studies, I was actively looking for roles in translational research. Being exposed to computer-aided drug design gave me an opportunity to translate my research outputs into useful insights for organisations engaged in drug discovery. I was motivated to join the Open Targets at EBI team because I had previously worked at EMBL and I was impressed by its focus on open science and providing open source tools and open access publications for the wider scientific community to use. I was particularly drawn to Open Targets because the collaborative nature of the consortium would give me the chance to work in teams and with people with different backgrounds and expertise. And the tractability project was particularly appealing because it is a very important assessment step at the beginning of drug development and there is considerable interest from both academia and industry. Tractability also involves an assessment of protein structures, which is similar to my previous work, but I like that it extends these assessments into a broader, translational use case.

3. What does the tractability pipeline do? What data sources does it rely on?

The tractability pipeline is a computational pipeline that was originally developed by research scientists at GSK to perform a bucket-based assessment about the tractability (or druggability) of a given target for small molecule or antibody modalities.

The pipeline uses Ensembl and UniProt identifiers to query various public databases in order to generate the tractability assessments:

  • ChEMBL
  • PDBe
  • UniProt
  • Human Protein Atlas
  • Gene Ontology
  • DrugEBIlity pipeline
  • TMHMM

It also uses the list of druggable genes identified by Finan et al. in The druggable genome and support for target identification and validation in drug development.

The pipeline takes less than an hour to run and it produces a .TSV file that is integrated in the Open Targets Platform and is available for download for further downstream analysis work.

Interested in learning more about how the Open Targets Platform makes use of this data? Head to the target tractability page in our docs and check the FAQs for our definition of target tractability (also known as druggability).

4. In the next release, we are expecting some changes to the tractability data. Can you talk to us about those changes and what we can expect in the latest pipeline output?

When the data was first integrated into the Open Targets Platform, it relied on ChEMBL v25 and the pipeline only produced an assessment for small molecule and antibody modalities in a single workflow. The pipeline also only included targets with at least one piece of tractability data in the .TSV output file.

In the upcoming release, the pipeline makes use of the most recent ChEMBL version - v26 - and also includes updated data from the other data sources. The pipeline itself has been expanded and subdivided into the following individual modality workflows:

  • Small molecule tractability
  • Antibody tractability
  • Clinical evidence for other modalities

We have also expanded the .TSV file produced by the pipeline to expose more of the underlying data that supports a tractability assessment. For example, in addition to data about PDBe structures or UniProt subcellular locations, the .TSV file also includes ChEMBL IDs for drugs that support a clinical assessment in buckets 1, 2, or 3 for all modality workflows. The .TSV file also includes an entry for targets with no tractability assessment data as this could be useful for further downstream analysis work.

For a description of each bucket, check the target tractability page.

5. You also made a change to bucket 4 in the small molecule workflow. Can you tell us more about the change and why you made it?

In the 20.04 release tractability data, we changed the ligand filtering of PDB structures in bucket 4 to increase true positives and reduce false negatives. In this bucket, the pipeline assesses whether for a given target there is a crystallographic structure available in the PDBe that contains a small molecule ligand. I realised that the targets included in bucket 4 annotation can change significantly depending on which type of ligands are included or filtered out. Initially, a few buffer compounds, solvents, cofactors and sugars were excluded.

However, I spotted two issues that influenced bucket 4 assessments.

First, the list of buffer compounds and solvents was not very large and one group was missing: crystallization agents/additives. These compounds are often contained in the crystallization conditions in high concentrations and therefore, tend to “stick” to the proteins in various unspecific places.

Second, the list of unwanted molecules also included potential cofactors. For some proteins, these are considered cofactors/coenzymes, but for others, they represent substrates, such as the energy providing nucleotides ATP, GTP, and NADP. For example, many kinases are targeted by drugs that mimic ATP binding.

Therefore, the new ligand filtering list enables us to exclude more unwanted ligands (475) than before (95), including buffer compounds, solvents, crystallisation agents/additives and sugars, but no potentially ambiguous cofactors.

6. And what can we expect from the pipeline over the coming months?

In the next few months, ChEMBL will be releasing our PROTAC tractability assessment data. PROTACs are hetero-bifunctional molecules designed to degrade target proteins (instead of inhibiting them) by “hijacking” the ubiquitin-proteosome system and therefore show different tractability requirements/characteristics. This will utilise data from a number of data sources that explore clinical evidence, protein target half life, ubiquitination sites, and literature on PROTAC drug discovery. We are also exploring new methods to replace the DrugEBIlity score used for buckets 5 and 6 in the small molecule workflow.

We are also working on producing a JSON output that would be available alongside the .TSV file output. A JSON format would allow the pipeline to show hierarchical/nested data structures for the data that supports tractability assessments. It also opens up new visualisation opportunities for the Open Targets Platform - for example, showing the location and confidence of antibody assessments using a mock up of a cell structure.

7. Thank you for sitting down with us Melanie for today’s Q&A. Before we go, anything else to add?

I would encourage your readers to download the data file and see all of the changes that have been made. They can also check out the tractability pipeline code on Github. And if they have any suggestions on target tractability, please get in touch with the ChEMBL team.

Blog header image source: Technology vector - fullvector