Ian Dunham has been a pillar of Open Targets since its inception. I sat down with him to understand how his career led him to become the Director of Open Targets, and what he envisions for the future of the consortium.
When did you first become interested in genetics and genomics?
I initially thought I wanted to do medicine, but after a trip to a hospital with a consultant, I quickly changed my mind. I studied biochemistry at university because I was interested in molecular biology and how organic chemistry is used in living systems, and I’d been fascinated by Jim Watson’s The Double Helix. This was the 80s, after the recombinant DNA revolution but before anything like genomics existed, so in a sense this was quite modern. But to be honest, I had no idea what I was getting myself into.
I was fascinated by recombinant DNA and cloning, and the ability to take viruses and bacteria and manipulate pieces of the genetic code to create new things. This led me to my undergraduate degree project, where I made cosmid libraries for the Factor IX gene — the clotting factor gene mutated in haemophilia A — cloning relatively large pieces of human DNA. I liked the idea that I was using the human genome to improve medicine and disease outcomes.
I was struggling to find a PhD position, when Rodney Porter, the Nobel Prize-winning biochemist, came to my rescue. I hadn’t actually met him, but he got in touch to offer me a studentship working on the genes for complement (an innate immune response), and I accepted. Sadly, I never got the chance to work with him because in the time it took me to start the PhD, he passed away.
In the second year of my PhD working with Duncan Campbell, I was interested in the structure of the HLA region: it has Class I and Class II genes, associated with many diseases such as autoimmune diseases, as well as genes for some of the complement components, but we didn’t know how everything was arranged. A newly-invented technique at the time was pulsed-field gel electrophoresis, which enables you to look at very long pieces of DNA — over 1 million base pairs — in one restriction enzyme fragment. Using pulsed-field gel electrophoresis, my task was to make a giant restriction map of this region, which turned out to be over 4 million bases in size. I succeeded, and was able to make a very detailed map at scale .
How did you become involved in the sequencing of the human genome?
It was a subject of discussion at the time, particularly when a new method to clone large pieces of DNA in yeast cells was invented. So I contacted Maynard Olson in Saint Louis, to ask if I could do a project in his lab. When I suggested we could clone a whole chromosome using Yeast Artificial Chromosomes (YACs), Maynard thought I was crazy. But in fact, three years later we started working on the Human Genome Project cloning human chromosome 22 with YACs.
In the meantime, I used YACs to clone the piece of DNA that forms the centromere of human chromosomes. We knew that this piece of the genome contained a lot of repeat sequences, but we didn’t know how long they were.
From there, things moved quite quickly. I moved back to the UK, and with postdoc funding from the Wellcome Trust working with David Bentley, I decided to use these YACs to make a map of a whole human chromosome. We started with chromosome 22, since it was thought to be the smallest chromosome, although we now know that chromosome 21 is smaller. We published our map of human chromosome 22 in 1995 , and this was really the beginning of large scale cloning for genome sequencing.
While this was happening, John Sulston was mapping the nematode genome, and the Wellcome Trust, motivated by Jim Watson, funded him to set up a human genome sequencing centre in the UK. They needed people who knew about the human genome, so in 1993, David Bentley and I joined them in the formation of the Sanger Centre (now the Wellcome Sanger Institute). There, my group continued to work on chromosome 22. One issue we ran into was how to generate enough reliable material for the sequencing. We stopped using YACs and switched to a new type of vector, Bacterial Artificial Chromosomes (BACs), which were much more amenable to insertion of human DNA.
At this point, several groups had started to work on chromosome 22, including a group at the Keio University School of Medicine in Japan, a group at the University of Pennsylvania School of Medicine, and a group at the University of Oklahoma. Rather than competing with each other on these different pieces, we decided to work together to stitch up the whole sequence of the chromosome. I became responsible for coordinating these efforts, which was made slightly easier by the advent of the Internet and email. We published the sequence just before the millennium, in December of 1999 . It was the first complete human chromosome — or at least substantial part of the chromosome — and a major success.
Where did you go from there?
Our work on chromosome 22 became a model for the rest of the public human genome sequencing. At Sanger, we continued to sequence other chromosomes, but we also used chromosome 22 as a model system for further analyses. Essentially, we said: we have the sequence, now what?
We did a couple of things. One was to try to create the most detailed map of gene annotation. The other was to think about how we could connect sequence with human variation. We started by sequencing many individuals to make a comprehensive map of human variation; this work was the genesis of single nucleotide polymorphism maps. We used the data to understand the pattern of variation across those individuals, which led us to create a linkage disequilibrium map of chromosome 22, the first of its kind .
This work enabled us to explore the functional effects of genetic variation. We began to investigate epigenetic signals, aiming to convert our knowledge of the genome sequence into an understanding of how the genome functions as a unit. Thanks to a grant from the US National Genome Research Institute, and in collaboration with the ENCODE consortium, we started by working on a relatively small scale, looking at just 44 different regions across the genome to identify how the histone modifications that mark out enhancers and promoters change in different cell types .
How did you shift to working in bioinformatics?
After I started working on ENCODE, the Sanger Institute changed the focus of its research, and so I, along with many of the groups involved in the mapping, sequencing and epigenetic work, were made redundant. I could have carried on what I was already doing elsewhere, but I decided to take a different course. I have always been very interested in computation and software, and so I retrained in bioinformatics.
I spent a year learning different programming languages, following tutorials, getting stuck into various data problems, and then working on functional genomics in the Ensembl team at EMBL’s European Bioinformatics Institute (EMBL-EBI). I joined Ewan Birney’s group , and as luck would have it, I ended up working on the new version of the ENCODE project, looking for epigenetic signals across the whole genome in a variety of cell types. The problem there was: how do you integrate all the data together? I acted as the project manager, coordinating the data analysis for a whole group of papers, including the ENCODE project’s major publication in 2012 .
How did the Open Targets project come about?
Our work was slowly bridging the gap in our understanding of genetic and epigenetic signals, leading us to the next hurdle: how do you convert the information generated in the genome project and genetics studies into useful information for drug discovery?
Starting from discussions between Janet Thornton of EMBL-EBI, Mike Stratton of the Wellcome Sanger Institute, and Patrick Vallance of GSK, we laid out what a collaboration between our institutions could look like, forming what is now Open Targets. At the time, it was the Centre for Therapeutic Target Validation, a name that came about because nobody could think of a good one. I take some credit for coming up with the name Open Targets, which we adopted in 2016.
I was appointed Science Director, tasked with fashioning the research programme. I had to learn about the drug discovery process, which took me back to some of my early biochemistry training. The challenge of Open Targets is to bridge the gap between drug discoverers, who tend to concentrate on one very specific mechanism, and the wider systematic genomics and genetics, which is useful to understand which targets are likely to be relevant for disease. My contribution to this has been to create a robust data integration system, making sure we bring in the right data, in a comprehensive way, and making sure we have the correct systems for that. I think we now have a well thought-out plan, bringing all the relevant data together and generating new data to address specific questions.
Now as Director, how do you see Open Targets going forward?
On the public side, I think that we’ve made a real impact with the Open Targets Platform and Open Targets Genetics, and we want to ensure that those will continue to be maintained.
Within the consortium, the driving question behind our work is: how do we take the existing research programme and convert it into targets or outputs that can be useful to our partners and feed into the drug discovery process? We want to use our experimental research programme to deliver cutting edge experiments and produce information about targets in disease.
From the genetics side, this includes understanding how to take forward common disease GWAS to identify the genes that are likely to be targets, as well as gene modulation and synthetic lethality screens in cancer or in neurodegenerative disease. Following on from that, we need to pinpoint the cellular locations of the effects, to understand where the genes we’ve identified are acting to cause disease and where we can intervene to modulate the disease. In that line of investigation, approaches like single cell sequencing or single cell expression quantitative traits will be essential.
There are bioinformatics associated with all of those processes, to make sense of the data we are generating. Our next steps in this area are to improve the way that we integrate data, to better identify the correct targets in disease. This will involve developing our machine learning capabilities, as well as combining the experimental data we are generating with the systematic integration of data from public resources.
Ultimately, our aim is to create a complete package of data pinpointing the most likely targets to be of interest in disease, including parameters such as tractability, cellular location, network expansion, etc. But the question is such an open-ended one that we won’t be able to address all aspects; we do our best to address individual problems, while bearing in mind how all the parts fit together.
And finally, what has been the most memorable moment of your career?
My involvement with the Human Genome Project has afforded me a few brushes with fame. In 1999, after we published the sequence of chromosome 22, I was invited to give a talk in Lisbon on live television. That evening, we went out for a drink, and when I went up to the bar, a man I’d never met before turned to me and said “You were that guy on television, weren’t you? Let me get you a drink!” And I thought: this is fame!
None of that has happened to me since, but it does show that there are occasionally unexpected rewards for being involved in science.
1. Dunham, I., Sargent, CA., Trowsdale, J. and Campbell, RD. Molecular mapping of the human major histocompatibility complex by pulsed-field gel electrophoresis. PNAS October 1, 1987 84 (20) 7237-7241
2. Collins JE, Cole CG, Smink LJ, et al. A high-density YAC contig map of human chromosome 22. Nature. 1995 Sep;377(6547 Suppl):367-379. DOI: 10.1038/377367a0. PMID: 7566101.
3. Dunham, I., Hunt, A., Collins, J. et al. The DNA sequence of human chromosome 22 . Nature 402, 489–495 (1999). DOI.10.1038/990031
4. Dawson, E., Abecasis, G., Bumpstead, S. et al. A first-generation linkage disequilibrium map of human chromosome 22 . Nature 418, 544–548 (2002). DOI.10.1038/nature00864
5. The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007). DOI.10.1038/nature05874
6. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome . Nature 489, 57–74 (2012). DOI.10.1038/nature11247