Open Targets Genetics is a portal to explore variant-gene-trait associations based on data from UK Biobank and the GWAS Catalog. The homepage allows you to search for a specific gene, variant or trait and explore the data associated with that search. The portal also provides for programmatic access to the underlying data for users who want to fine-tune their results or automate their query processes.
This blog post is the first in a series regarding how you can use the portal, and specifically the GraphQL interface, to explore, access and use the data available. This post provides an introduction to GraphQL and some of its features. A follow-up post will explore in detail how you can use Open Targets Genetics to answer a biological question.
This guide supplements the official documentation and acts as a walk-through to help you become familiar with finding data programmatically. The underlying technology we'll be working with is called GraphQL. The basic idea behind GraphQL is that you should be able to request the information in the form that you need it. A traditional REST endpoint returns entities as they are represented by the program's maintainer, and do not necessarily reflect how end users want to structure the data. In GraphQL the user specifies the representation of the data by combining the elements made available by the program.
There are two ways to access the Genetics portal programmatically using GraphQL:
- making web requests using either command line tools such as
wgetor GUI programs like Postman; or
- using the provided browser endpoint.
Using the provided browser endpoint gives you lots of extra features to help structure your queries and explore the data available, so we'll focus on that in this guide.
The GraphQL browser
As an example, let’s imagine I want to get all GWAS and UK Biobank studies associated with BRCA1 (ENSG00000012048).
When you navigate to the browser you'll see the following interface:
It doesn't look like much, but as we'll see, it's quite powerful. Important areas include the:
- triangular play button at the top executes queries written in the text area on the left-hand side of the screen;
- history button in the menu bar allows us to select, edit and execute previous queries; and
- the 'Docs' button on the top right of the menu bar.
Let's start by selecting the 'Docs' button. This will present us with the view below once we click on the
Query item under the
Root types list.
Here we can see a number of entities listed under
Fields starting with search, genes, etc. We're looking for a gene, so we want a GraphQL field that has a return type of
Gene. In this case
geneInfo looks promising. Let's look at its signature in more detail:
geneInfo(geneId: String!): Gene
This tells us that we can query something called
geneInfo, and it will return to as a
Gene. In order for it to do this we need to provide it a
geneId as an argument which has the type
String. The exclamation point tells us that we must provide this argument.
Knowing this, we can start to write our query:
You'll recall that the top of the query tree when we first opened the docs menu was called query, so we start out with that.
gene is the name we have given to this particular query, and after we have executed it there will be an entry in the history with this same so we can repeat the same query again if necessary. Inside the first set of curly braces is the meat of our query, we use the field we identified above, and complete its signature with the Ensembl ID we want to investigate.
We're not ready to execute the query just yet however, as now we're getting to the strength of GraphQL. We need to tell the server which of the available fields that we'd like returned to us. In the image above we have specified that we would like the
symbol, and we're being assisted by the pop-up which is showing us other fields we might choose. That's all well and good, but how to know what fields are available to us?
Let's go back to the documentation panel!
We want to have a
Gene object returned to us, so we are specifying the fields from that object which we want. Selecting
Gene in the docs menu shows us the fields from which we can choose. We can see that there is a list of the field name, its return type, and a description of that field. Once again, the exclamation point denotes a required field, but in this case its a promise that if we request that field we'll get a result back. Fields without an exclamation point may return 'null'. We can now complete our query by choosing from the fields available on
Gene and executing the query using the triangular 'play' button.
As you can see, the results are returned to us in right-hand panel correspond to the fields specified in the left hand panel. So far, so good!
Now let's see if we can get the studies relating to that gene.
Again, we can look at the documentation and find the field that we need. Since we're using a specific gene as our starting point we want a method that accepts
geneId as a parameter, and can return to us a
Study object. The following field looks promising as it takes the correct parameter, and returns something called
studiesAndLeadVariantsForGene(geneId: String!): [StudiesAndLeadVariantsForGene!]!
Note that this time the return type
StudiesAndLeadVariantsForGene is surrounded by square brackets. This indicates that an array of results will be returned. Similarly as before, we can click on
StudiesAndLeadVariantsForGene to examine the fields we can select.
There are several options under
study, so we'll select that. Study in turn has additional fields we could explore, but to get us started we can execute the query as is.
Because we didn't specify which fields of
study we were interested in, GraphQL automatically populated the query with all the fields on the study object. We can see in the right-hand panel that as well as having the
geneInfo field we now have
Exploring the past
Before we go further, it's time to point out the useful History function. If we click on the
History button on the top of the page we'll see the following:
The history is a reverse chronological order (most recent first) listing of all our executed queries, named after the query we executed. Because we have executed the
gene query twice it appears twice in our history. If you select the first entry it will be the query asking only for
geneInfo, whereas the topmost entry will be our revised query which also includes study information. It's a good idea to give the queries informative names to make the most of this feature.
There is one final feature to discuss before finishing this post. We have had to write out
geneId twice, once for the information about the gene and once for the studies. If we want to look for a different gene we'll have to update the query twice which is both inconvenient and can easily lead to mistakes. Fortunately GraphQL allows ups to use a variable to hold the information we want to change and use the same value in multiple places.
In the image above we have made three changes. After the query name we have declared a variable type:
This means that anywhere in our query we can write
$gene in places where we were asked to provide a
String! variable. But where does this value come from? At the bottom left-hand side of the screen is another panel called Query Variables. Here we can set the value of
gene to our desired ID. Now, when we execute our query this value will be inserted into our code where
$gene once stood. Now, if we want to search for a different gene we only have to update our code in one place!
I hope this short guide has proved useful in introducing you to the flexibility of GraphQL and the Genetics portal more broadly! You can find more examples in the documentation which provides some further GraphQL examples and tutorials.
The Open Targets Platform also now provides a GraphQL endpoint where the same techniques learnt here can be applied to explore the platform's data.