Target Validation API Tutorial: Getting Started

Since release 1.1 the Target Validation Platform exposes a public REST API to allow programmatic retrieval of data served at targetvalidation.org.
This is the same API that powers our website and gives full access to the data used to build www.targetvalidation.org.

The Available methods are divided in to:

  • public - Methods that serve the core set of our data and that we will keep stable and support.
  • private - Methods used by the web app to serve additional data. These methods change often and thus should not be relied upon.
  • utils - Methods to get statistics and technical data about the API.
  • auth - Methods used for authentication. Only relevant if you have been issued an API key (you can request one by emailing us.

Each of the methods is described in detail in the API documentation, where each API call can be tested using an interactive interface powered by swagger-ui.

The tools to use

Before diving into the tutorial, it is worth touching briefly on which tools are available to start using our API. If you are familiar with REST APIs and HTTP calls you can probably skip ahead to the next section.

Opening a query URL such as https://www.targetvalidation.org/api/latest/public/utils/stats in a browser will return some statistics on the data included in the Target Validation Platform in JSON format.
However, using a browser is impractical for anything more complicated than a simple query.

On the command line, the classic option is to use the curl command. A typical curl call would be:

curl -X GET <query url>  

So to check the version of the latest available API, the call would be:

curl -X GET https://www.targetvalidation.org/api/latest/public/utils/version  

which returns

1.1  

and to get some statistics:

curl -X GET https://www.targetvalidation.org/api/latest/public/utils/stats  

which returns the following JSON data

{"associations": {"datatypes": {"literature": {"total": 1031205, "datasources": {"europepmc": {"total": 1031205}}}, "rna_expression": {"total": 711158, "datasources": {"expression_atlas": {"total": 711158}}}, "genetic_association": {"total": 107993, "datasources": {"eva": {"total": 27356}, "gwas_catalog": {"total": 58851}, "uniprot": {"total": 12739}, "uniprot_literature": {"total": 45185}}}, "somatic_mutation": {"total": 20019, "datasources": {"cancer_gene_census": {"total": 19488}, "eva_somatic": {"total": 990}}}, "known_drug": {"total": 51316, "datasources": {"chembl": {"total": 51316}}}, "animal_model": {"total": 613231, "datasources": {"phenodigm": {"total": 613231}}}, "affected_pathway": {"total": 3198, "datasources": {"reactome": {"total": 3198}}}}, "total": 2175851}, "evidencestrings": {"datatypes": {"literature": {"total": 3166744, "datasources": {"europepmc": {"total": 3166744}}}, "rna_expression": {"total": 433809, "datasources": {"expression_atlas": {"total": 433809}}}, "genetic_association": {"total": 55964, "datasources": {"eva": {"total": 19857}, "gwas_catalog": {"total": 25536}, "uniprot": {"total": 4786}, "uniprot_literature": {"total": 5785}}}, "somatic_mutation": {"total": 10247, "datasources": {"cancer_gene_census": {"total": 9790}, "eva_somatic": {"total": 457}}}, "known_drug": {"total": 29812, "datasources": {"chembl": {"total": 29812}}}, "animal_model": {"total": 395407, "datasources": {"phenodigm": {"total": 395407}}}, "affected_pathway": {"total": 9468, "datasources": {"reactome": {"total": 9468}}}}, "total": 4101451}, "targets": {"total": 24716}, "diseases": {"total": 8051}}

A more user-friendly alternative is to use the httpie tool. It allows a simpler syntax to query the API methods with and it will format the response to improve readability. After installing httpie, the same API call as above returns much clearer output:

http https://www.targetvalidation.org/api/latest/public/utils/stats  
{
    "associations": {
        "datatypes": {
            "affected_pathway": {
                "datasources": {
                    "reactome": {
                        "total": 3198
                    }
                },
                "total": 3198
            },
            "animal_model": {
                "datasources": {
                    "phenodigm": {
                        "total": 613231
                    }
                },
                "total": 613231
            },
            "genetic_association": {
                "datasources": {
                    "eva": {
                        "total": 27356
                    },
                    "gwas_catalog": {
                        "total": 58851
                    },
                    "uniprot": {
                        "total": 12739
                    },
                    "uniprot_literature": {
                        "total": 45185
                    }
                },
                "total": 107993
            },
            "known_drug": {
                "datasources": {
                    "chembl": {
                        "total": 51316
                    }
                },
                "total": 51316
            },
            "literature": {
                "datasources": {
                    "europepmc": {
                        "total": 1031205
                    }
                },
                "total": 1031205
            },
            "rna_expression": {
                "datasources": {
                    "expression_atlas": {
                        "total": 711158
                    }
                },
                "total": 711158
            },
            "somatic_mutation": {
                "datasources": {
                    "cancer_gene_census": {
                        "total": 19488
                    },
                    "eva_somatic": {
                        "total": 990
                    }
                },
                "total": 20019
            }
        },
        "total": 2175851
    },
    "diseases": {
        "total": 0
    },
    "evidencestrings": {
        "datatypes": {
            "affected_pathway": {
                "datasources": {
                    "reactome": {
                        "total": 9468
                    }
                },
                "total": 9468
            },
            "animal_model": {
                "datasources": {
                    "phenodigm": {
                        "total": 395407
                    }
                },
                "total": 395407
            },
            "genetic_association": {
                "datasources": {
                    "eva": {
                        "total": 19857
                    },
                    "gwas_catalog": {
                        "total": 25536
                    },
                    "uniprot": {
                        "total": 4786
                    },
                    "uniprot_literature": {
                        "total": 5785
                    }
                },
                "total": 55964
            },
            "known_drug": {
                "datasources": {
                    "chembl": {
                        "total": 29812
                    }
                },
                "total": 29812
            },
            "literature": {
                "datasources": {
                    "europepmc": {
                        "total": 3166744
                    }
                },
                "total": 3166744
            },
            "rna_expression": {
                "datasources": {
                    "expression_atlas": {
                        "total": 433809
                    }
                },
                "total": 433809
            },
            "somatic_mutation": {
                "datasources": {
                    "cancer_gene_census": {
                        "total": 9790
                    },
                    "eva_somatic": {
                        "total": 457
                    }
                },
                "total": 10247
            }
        },
        "total": 4101451
    },
    "targets": {
        "total": 23250
    }
}

An easy way to construct more complex queries is to head to our interactive API interface, which allows you to input parameters for each method and visualise a nicely formatted response, as well as the URL for each call.

Another very popular option to get data programmatically from the API is python, leveraging the great requests library. You can build a GET request and for each response read the status code, headers and content, both as strings or serialized into a JSON object.

>>> import requests

>>> r = requests.get('https://www.targetvalidation.com/api/latest/public/stats')

>>> r.status_code
200

>>> r.headers['content-type']
'application/json'

>>> r.text
u'{"associations": {"datatypes": { ...'

>>> r.json()
{"associations": {"datatypes": {"literature": {"total": 1029501, "datasources": {"europepmc": {"total": 1029501}}}, "rna_expression": {"total": 711158 ... }

Finding the identifier (id) of a target or a disease

As described in the platform's about section, targetvalidation.org brings together evidence which associates potential drug targets with diseases.

To access target information through the API, it is necessary to use the Ensembl gene id for the corresponding gene (e.g. the Ensembl gene identifier for NOD2 is ENSG00000167207).

We map diseases to terms in the Experimental Factor Ontology (EFO). Each disease is linked to an EFO id (e.g. the EFO id for "inflammatory bowel disease" is EFO_0003767)

To make the mapping of IDs to the desired target easier, we can use the /public/search method of the API. This method replicates the search box on the targevalidation.org home page. Using this method, it is possible to search for a gene or protein by their symbol, common name or any synonym specified in the major biological databases. The response will contain the id, together with summary data about what is known about the target.

If we search for the NOD2 gene and we limit our attention to the first result setting the size parameter to 1, we get:

http https://www.targetvalidation.org/api/latest/public/search q==NOD2 size==1 filter==target  
HTTP/1.1 200 OK  
Content-Type: application/json  
{
    "data": [
        {
            "data": {
                "approved_name": "nucleotide binding oligomerization domain containing 2",
                "approved_symbol": "NOD2",
                "association_counts": {
                    "direct": 221,
                    "total": 380
                },
                "biotype": "protein_coding",
                "description": "Involved in ...",
                "ensembl_gene_id": "ENSG00000167207",
                ...
                "name_synonyms": [
                    "Inflammatory bowel disease protein 1",
                    "nucleotide-binding oligomerization domain, ...
                ],
                "symbol_synonyms": [
                    "BLAU",
                    "CD",
                    ...
                ],
                "top_associations": {
                    "direct": [
                        {
                            "id": "ENSG00000167207-EFO_0000701",
                            "score": 1.0
                        }, ...
                    ],
                    "total": [
                        {
                            "id": "ENSG00000167207-EFO_0000540",
                            "score": 1.0
                        }, ...
                    ]
                },
                "type": "target",
                "uniprot_accessions": [
                    "Q9HC29",...
                ]
            },
            ...
}

A tool such as the jq command can be useful to parse the resulting JSON responses on the command line. You can isolate specific fields by typing a . followed by the field name you want to filter from your JSON.
Thus the same query can be piped to jq to obtain the ensembl gene id.

http https://www.targetvalidation.org/api/latest/public/search q==NOD2 size==1 filter==target | jq '.data[] | .id'  
"ENSG00000167207"

There are other options to find the Ensembl gene identifier of a target, including leveraging the ensembl REST API, but we recommend looking up ids using our own API to ensure consistency with successive queries.

You can obtain the same result in python by parsing the JSON as a dictionary and finding the correct index:

>>> import requests
>>> from pprint import pprint
>>> r = requests.get('https://www.targetvalidation.org/api/latest/public/search',
params={"q":"NOD2","size":1})  
>>> pprint(r.json())

{'data': [{'data': {'approved_name': 'nucleotide binding oligomerization '
                                     'domain containing 2',
                    'approved_symbol': 'NOD2',
                    'association_counts': {'direct': 221, 'total': 380},
                    'biotype': 'protein_coding',
                    'description': 'Involved in ...',
                    'ensembl_gene_id': 'ENSG00000167207',
...

which returns the expected JSON object. To select specific keys one needs to traverse the resulting dictionary.

>>> r.json()['data'][0]['id']
'ENSG00000167207'  

The process to find disease ids is very similar, although for less common diseases or those have many synonyms it is advisable to return more than one result at a time and then pick the most appropriate EFO id.

>>> r = requests.get('https://www.targetvalidation.org/api/latest/public/search',
params={"q":"inflammatory bowel disease","size":1})

>>> r.json()['data'][0]['id']
'EFO_0003767'  

Just as above for targets, it is possible to find EFO IDs by querying the ontology directly through the Ontology Lookup Service API but we recommend using our API directly, since the EFO version can at times not be in sync with the one we use in targetvalidation.org

Finding associations between target and disease

With an id for a target and a disease, it is possible to query the API for the presence of any associations linking the two using the public/association/filter method.
Continuing with the inflammatory bowel disease (IBD) example:

http https://www.targetvalidation.org/api/latest/public/association/filter target==ENSG00000167207 disease==EFO_0003767  

returns the association JSON object summarising the data present in targetvalidation.org that links NOD2 to IBD

{
    "data": [
        {
            "association_score": {
                "datasources": {
                    "cancer_gene_census": 0.0,
                    "chembl": 0.0,
                    "disgenet": 0.0,
                    "europepmc": 0.3746566730517112,
                    "eva": 0.0,
                    "eva_somatic": 0.0,
                    "expression_atlas": 0.024180000000000004,
                    "gwas_catalog": 0.7180094552025357,
                    "phenodigm": 0.17046,
                    "reactome": 0.0,
                    "uniprot": 0.0,
                    "uniprot_literature": 1.0
                },
                "datatypes": {
                    "affected_pathway": 0.0,
                    "animal_model": 0.17046,
                    "genetic_association": 1.0,
                    "known_drug": 0.0,
                    "literature": 0.3746566730517112,
                    "rna_expression": 0.024180000000000004,
                    "somatic_mutation": 0.0
                },
                "overall": 1.0
            },
            "disease": {
                "efo_info": {
                    "label": "inflammatory bowel disease",
                    "path": [
                        [
                            "EFO_0000405",
                            "EFO_0003767"
                        ],
                        [
                            "EFO_0000540",
                            "EFO_0005140",
                            "EFO_0003767"
                        ]
                    ],
                    "therapeutic_area": {
                        "codes": [
                            "EFO_0000405",
                            "EFO_0000540"
                        ],
                        "labels": [
                            "immune system disease",
                            "digestive system disease"
                        ]
                    }
                },
                "id": "EFO_0003767"
            },
            "evidence_count": {
                "datasources": {
                    "cancer_gene_census": 0.0,
                    "chembl": 0.0,
                    "disgenet": 0.0,
                    "europepmc": 1028.0,
                    "eva": 0.0,
                    "eva_somatic": 0.0,
                    "expression_atlas": 1.0,
                    "gwas_catalog": 13.0,
                    "phenodigm": 3.0,
                    "reactome": 0.0,
                    "uniprot": 0.0,
                    "uniprot_literature": 2.0
                },
                "datatypes": {
                    "affected_pathway": 0.0,
                    "animal_model": 3.0,
                    "genetic_association": 15.0,
                    "known_drug": 0.0,
                    "literature": 1028.0,
                    "rna_expression": 1.0,
                    "somatic_mutation": 0.0
                },
                "total": 1047.0
            },
            "id": "ENSG00000167207-EFO_0003767",
            "is_direct": true,
            "target": {
                "gene_info": {
                    "name": "nucleotide binding oligomerization domain containing 2",
                    "symbol": "NOD2"
                },
                "id": "ENSG00000167207"
            }
        }
    ],
    "facets": {},
    "from": 0,
    "size": 1,
    "therapeutic_areas": [],
    "took": 21,
    "total": 1
}

The content of the association object which is returned provides a good illustration of the underlying structure of the data in the Target Validation Platform. The JSON response includes a data array, whose content is divided in to:

  • Association score: for each target we compute an association score indicating the strength of the available evidence connecting target to disease. We can use the score to rank target-to-disease links with respect to each other, as we can see when looking at inflammatory bowel disease on targetvalidation.org.
    The scoring is explained in detail elsewhere, but is worth briefly summarizing here to better interpret the JSON response.
    For each data source we compute a score based on the evidence linking target to disease. Similar data sources are then grouped into datatypes and an association score per datatype is computed using an harmonic sum. The overall association score for a target and a disease is calculated as the sum of the harmonic series of the individual datatype scores adjusting the contribution of each data type using a heuristic weighting.

  • Disease which contains the EFO id and EFO information about the disease, including its position in the ontology hierarchy and a list of therapeutic areas.

  • Evidence count the association object does not include all evidence, but only a summary of how many pieces of evidences where found for each category. Below we will look how the API can be used to get further details on each evidence item.

  • id a <ensembl target ID>-<disease EFO id> unique identifier for the association

  • is_direct which is a true/false variable indicating if the connection between target and disease is directly observed or inferred from the relationship the current disease has with some other disease in the ontology.

  • target which contains information about the gene or protein, as well as its id

        {
            "association_score": {
                "datasources": {...}, 
                "datatypes": {...}, 
                "overall": 1.0
            }, 
            "disease": {
                "efo_info": { ... },
                "id": "EFO_0003767"
            }, 
            "evidence_count": {
                "datasources": { ... }, 
                "datatypes": { ... }, 
                "total": 1047.0
            }, 
            "id": "ENSG00000167207-EFO_0003767", 
            "is_direct": true, 
            "target": {
                "gene_info": { ... }, 
                "id": "ENSG00000167207"
            }
        }

It is also possible to query the public/association/filter endpoint with just a target or a disease identifier, and choose to have just some part of the response returned. So to get the 3 diseases most strongly associated with the BRAF target:

http https://www.targetvalidation.org/api/latest/public/association/filter target==ENSG00000157764 size==3 field==disease  
{
    "data": [
        {
            "disease": {
                "efo_info": {
                    "efo_id": "http://www.ebi.ac.uk/efo/EFO_0000616", 
                    "label": "neoplasm", 
                    "path": [
                        [
                            "EFO_0000616"
                        ]
                    ], 
                    "therapeutic_area": {
                        "codes": [], 
                        "labels": []
                    }
                }, 
                "id": "EFO_0000616"
            }
        }, 
        {
            "disease": {
                "efo_info": {
                    "efo_id": "http://www.ebi.ac.uk/efo/EFO_0000311", 
                    "label": "cancer", 
                    "path": [
                        [
                            "EFO_0000616", 
                            "EFO_0000311"
                        ]
                    ], 
                    "therapeutic_area": {
                        "codes": [
                            "EFO_0000616"
                        ], 
                        "labels": [
                            "neoplasm"
                        ]
                    }
                }, 
                "id": "EFO_0000311"
            }
        }, 
        {
            "disease": {
                "efo_info": {
                    "efo_id": "http://www.ebi.ac.uk/efo/EFO_0000313", 
                    "label": "carcinoma", 
                    "path": [
                        [
                            "EFO_0000616", 
                            "EFO_0000311", 
                            "EFO_0000313"
                        ]
                    ], 
                    "therapeutic_area": {
                        "codes": [
                            "EFO_0000616"
                        ], 
                        "labels": [
                            "neoplasm"
                        ]
                    }
                }, 
                "id": "EFO_0000313"
            }
        }
    ], 
    "facets": {}, 
    "from": 0, 
    "size": 3, 
    "therapeutic_areas": [], 
    "took": 22, 
    "total": 763

Get all the available evidence for a target-disease association

For a given target-disease pair the object returned by the public/association/filter endpoint is a summary of all the evidence available.
The public/evidence/filter endpoint will instead serve every single piece of information for a target-disease relationship in the form of an evidence objects.
Evidence objects are an enriched version of our input data described in the Open Targets json schema.

Continuing with the NOD2 and IBD example, we can construct a query to retrieve basic information about the evidence available. By default the API returns the strongest 10 pieces of evidence, but that can be changed with the size parameter.

http https://www.targetvalidation.org/api/latest/public/evidence/filter target==ENSG00000167207 disease==EFO_0003767 datastructure==simple  
{
    "data": [
        {
            "disease.efo_info.label": "ulcerative colitis", 
            "disease.id": "EFO_0000729", 
            "id": "442adb2d1b13c4f2ccf7389988aefde5", 
            "scores.association_score": "1.0", 
            "sourceID": "uniprot_literature", 
            "target.gene_info.symbol": "NOD2", 
            "target.id": "ENSG00000167207", 
            "type": "genetic_association"
        }, 
        {
            "disease.efo_info.label": "Crohn's disease", 
            "disease.id": "EFO_0000384", 
            "id": "b854a299c31156c3b14baf1b6638041c", 
            "scores.association_score": "1.0", 
            "sourceID": "uniprot_literature", 
            "target.gene_info.symbol": "NOD2", 
            "target.id": "ENSG00000167207", 
            "type": "genetic_association"
        }, 
        {
            "disease.efo_info.label": "Autosomal recessive early-onset inflammatory bowel disease", 
            "disease.id": "Orphanet_238569", 
            "id": "14be78930ae2c3275aaf6d2ebaf94b9d", 
            "scores.association_score": "0.4972", 
            "sourceID": "phenodigm", 
            "target.gene_info.symbol": "NOD2", 
            "target.id": "ENSG00000167207", 
            "type": "animal_model"
        }, 
        {
            "disease.efo_info.label": "Autosomal recessive early-onset inflammatory bowel disease", 
            "disease.id": "Orphanet_238569", 
            "id": "cb218a9ee90fc091c88662039c2591a9", 
            "scores.association_score": "0.43", 
            "sourceID": "phenodigm", 
            "target.gene_info.symbol": "NOD2", 
            "target.id": "ENSG00000167207", 
            "type": "animal_model"
        }, 
        {
            "disease.efo_info.label": "Autosomal recessive early-onset inflammatory bowel disease", 
            "disease.id": "Orphanet_238569", 
            "id": "f73ba0702491e426e76c4c5cb130c5ea", 
            "scores.association_score": "0.4203", 
            "sourceID": "phenodigm", 
            "target.gene_info.symbol": "NOD2", 
            "target.id": "ENSG00000167207", 
            "type": "animal_model"
        }, 
        {
            "disease.efo_info.label": "Crohn's disease", 
            "disease.id": "EFO_0000384", 
            "id": "cb48f4e17df89aaf89a3cf8e025d981c", 
            "scores.association_score": "0.414", 
            "sourceID": "europepmc", 
            "target.gene_info.symbol": "NOD2", 
            "target.id": "ENSG00000167207", 
            "type": "literature"
        }, 
        {
            "disease.efo_info.label": "inflammatory bowel disease", 
            "disease.id": "EFO_0003767", 
            "id": "d82be30eda25577a247c5eabc5e00807", 
            "scores.association_score": "0.376", 
            "sourceID": "europepmc", 
            "target.gene_info.symbol": "NOD2", 
            "target.id": "ENSG00000167207", 
            "type": "literature"
        }, 
        {
            "disease.efo_info.label": "Crohn's disease", 
            "disease.id": "EFO_0000384", 
            "id": "95fbdb0bd5e06d0883ae387b3b663c3d", 
            "scores.association_score": "0.35600000000000004", 
            "sourceID": "europepmc", 
            "target.gene_info.symbol": "NOD2", 
            "target.id": "ENSG00000167207", 
            "type": "literature"
        }, 
        {
            "disease.efo_info.label": "Crohn's disease", 
            "disease.id": "EFO_0000384", 
            "id": "4392f242b6646459bf58505178739416", 
            "scores.association_score": "0.34", 
            "sourceID": "europepmc", 
            "target.gene_info.symbol": "NOD2", 
            "target.id": "ENSG00000167207", 
            "type": "literature"
        }, 
        {
            "disease.efo_info.label": "Crohn's disease", 
            "disease.id": "EFO_0000384", 
            "id": "6a41a2481cbb047ebf61bd2eaf3fde86", 
            "scores.association_score": "0.314", 
            "sourceID": "europepmc", 
            "target.gene_info.symbol": "NOD2", 
            "target.id": "ENSG00000167207", 
            "type": "literature"
        }
    ], 
    "facets": null, 
    "from": 0, 
    "size": 10, 
    "therapeutic_areas": [], 
    "took": 387, 
    "total": 1047
}

This tutorial is just scratching the surface of what is possible to do with the Target Validation REST API. Very complex queries can be created to cover many usage scenarios. We will cover those in following more in depth tutorials in this blog.

Consider getting an API token

There is a fair usage limit to the calls to the REST API. If the quota is exceeded, a 429 error is returned. The response will indicate how much time is needed to pass before a new call can be made in the Retry-After header.

The default usage limit will typically not impact normal use of the REST API. However if you are planning to do a large number of requests or are building an application leveraging our API, you should probably email us and obtain an API key. Having a key will allow you to make many more requests than an anonymous user and will also help us in the long term to make the API better for you. (Update: have a look at our post explaining how to use the API key for more details)

Importantly, we will never track the content of your API request but instead monitor the overall usage of the API.

Also, don't forget to let us know what you think of the API after using it - we'd love to hear how we could make the API more useful in our next release.