Code breaking and the Human Genome Project
On the 20th anniversary of the completion of the Human Genome Project, Ian Dunham, Open Targets Director and a leading scientist in the sequencing of the human genome, reflects on the early years of the Wellcome Sanger Institute after visiting Bletchley Park, the centre of wartime codebreaking activities.
The story of Bletchley Park could have been pulled from the pages of a ripping yarn. A group of diverse British intellectuals brought together in an English country house under military secrecy solved a mission critical problem at a time of existential crisis for the country. Strolling through the exhibition describing how the operation scaled up from its initial codebreaking mission to intelligence generation of industrial scale, I was struck by an eerie sense of déjà vu with the great project of my life, the sequencing of the human genome.
Since the operations at ‘Station X’ became declassified, the wartime work at Bletchley Park including notable individuals such as Alan Turing has become well known in popular films and books. The intercepted information obtained through their efforts is thought to have substantially shortened the course of the war1, and some of the technical work is now viewed as the forerunner of modern electronic computing. The Park is preserved as a heritage museum2, and as a British scientist with an interest in computers, brought up on stories of wartime heroism and living under an hour away, it is perhaps remiss of me not to have visited earlier.
Now I don’t want to claim too much. The background and impetus of the two projects were clearly very different. However, there are notable parallels both in the way the science was conducted and, at least for the genome project in the UK, the particular physical settings and sensibilities.
A sense of place
Bletchley Park consists of a mansion built in the 19th century set in a large estate and was purchased in 1938 by the Head of the Secret Intelligence Service, Admiral Sir Hugh Sinclair, to house the UK Government Code and Cypher School3. Interestingly he purchased it himself because the Government did not have available funds. In 1993, the Wellcome Trust was searching for a site to set up a substantial UK effort to contribute to the sequencing of the human and other genomes under the direction of John Sulston, after it became clear that the UK Government was unable to support something of the necessary scale. Several sites around Cambridge were considered, but eventually attention settled on Hinxton Hall, a 17th century country house set in a landscaped estate4,5. I later learnt that the house and grounds had also been requisitioned to provide billets for British troops in World War 2, and the stepfather of my long-time collaborator John Collins had been based there for a while during that time.
I first visited the site in 1993 with David Bentley and colleagues since we were to provide the human genome mapping expertise of the endeavour. Although the estate was undoubtedly pretty, the Hall had seen better days and was rather dominated by a 1950s building that provided laboratories for the light engineering company that had owned the site. After some renovation, that building with its metal single glazed windows and old wooden benches was to be our home until such time as a more modern building was constructed. Later temporary prefabricated buildings (Portacabins) were also added to house the growing workforce. I don’t remember now whether we called them huts, but they were essentially the same as the huts built at Bletchley Park for their expansion. We were able to access the rooms in the Hall and the overall feel of the place was much closer to wartime than it was to the modern conference facilities that now occupy the site after renovation.
Make do and map
The early days of the Sanger Centre (as it was called initially) had the air of a great adventure and quite a lot of British ‘making do’. There was a lot of making do with whatever resources you had to hand. Although we had some of the latest cutting-edge kit in terms of sequencers and robots, often a skilled practitioner could perform methods for cloning and mapping with higher throughput by hand. Protocols would be worked out on a small scale first, scaled up by hand and maybe later implemented on a robot built to specification by the engineers. The same went for analysis tools. For our first maps we would draw out contigs of bacterial clones on graph paper and orient them to regions of the chromosome using wallcharts. Only later did we build the software to build the maps on computers6,7.
The early processes at Bletchley Park were similar, solving codes by hand, interpreting the intelligence with banks of index cards and wall maps. Only later were the first computers established to automate intelligence gathering.
Assembling a team
At the core of this of course were people to provide both the intellectual input and the perseverance to make things happen. A diverse group of people were recruited to Bletchley Park. ranging from university academics through engineers and clerical staff to the military. My impression from their testimonies is that they must have felt a great sense of a common purpose and the importance of their work, despite the necessary secrecy and sometimes drudgery involved.
For the genome sequencing at the Sanger Centre, we also brought together diverse people to achieve our goals. Computer scientists and mathematicians, biologists, engineers, laboratory technicians and clerical staff were all recruited to provide the expertise in each of the parts that would make the whole. Many of the staff were very young, and particularly for the task of looking at individual sequences and ‘finishing’ each segment, we recruited young enthusiastic graduates straight out of university.
There was also a sense of contributing to a great endeavour leading to great camaraderie. Parties celebrating each milestone were common, accompanied by obligatory cake, and ‘drinks and nibbles’, and the grounds provided ample opportunities for sports and inspiring strolls. Each Christmas presented the opportunity for the leaders to make fools of themselves in the annual pantomime, a tradition which we had inherited from the MRC Laboratory of Molecular Biology (LMB) in Cambridge and perhaps from earlier scientists who had known ENSA during service in the War.
Breaking the (DNA) code
The tasks of interpreting DNA sequence is often thought of as analogous to codebreaking and many of the techniques used have common antecedents. However, from the point of view of the genome project perhaps the most important parallel with wartime intelligence gathering was how to scale up processes that already existed8. Once the Bletchley Park scientists understood how to break the Enigma or other code once, how could they do it again and again each day as the machine settings were changed? How could they use other intelligence to put the messages into context and interpret German troop movements or chains of command? Turning intelligence gathering into an industrial process required ingenuity, automation engineering and brute force9.
The process of mapping and sequencing then interpretation of the human genome was no different. Processes that had been performed at small scale routinely needed to be scaled up, automated where necessary and quality controlled. We had to learn to handle the data efficiently and to piece together the sequence at a scale not achieved before. We had to learn how to identify genes effectively. Remember that while this was going on there was still considerable doubt even about how many genes a human genome contained10, 11.
The Bletchley Park operation would not have succeeded without weighty backing. For many high up in the military, it was considered a side show until the intelligence obtained proved its worth in concrete actionable insights. But it did have believers in the UK government, and after one visit to the site Churchill promised the resources necessary to maximise the value of intelligence gathering.
The Human Genome Project was initially seen as largely an American endeavour, and in the UK there was some, but insufficient, backing from the Government. Paradoxically the UK had many influential scientists who had pioneered the new field of genomics. The sequencing method with chain terminators that would dominate the project had been invented here by Fred Sanger, and exploited by LMB scientists including Bart Barrell and Alan Coulson. One method for mapping eukaryote genomes that would be used at widescale for human had been pioneered in the nematode C. elegans by Sulston and Alan Coulson. It was against this background that the Wellcome Trust, encouraged by James Watson from his role leading the NIH’s Genome Institute, would realise the importance of supporting the project in the UK through founding the Sanger Centre. Not only did this establish a credible second front for the project in Europe, but it also entrenched the Wellcome Trust’s standing on the international stage from where it could influence many future projects.
Although I felt these parallels very strongly during my visit to Bletchley Park and have tried to give some personal insight to the history of the human genome sequencing at Hinxton, there is one area where the two could not be more different. By its nature, war time intelligence must be kept secret from one’s enemies, else its value is undermined. However, the very existence of the activities of the wartime codebreakers was kept secret for many years after the war and only became well known in the 1990s, arguably long after there was any national security risk.
In contrast, John Sulston’s philosophy was to release our human code to the public domain as soon as possible, and under the Bermuda Principles the public sequencing consortium agreed that they would all do this as soon as each unit of sequence was assembled12. This principle of rapid release of foundational data has extended out to other large-scale projects in genomics because more is to be gained by having more scientists working on it than by maintaining secrecy until, or even after, publication13. But perhaps after all even in this aspect the two projects are not so different. Both wanted the artefact they were producing to make a difference in the right hands. It is just that the right audience was different.
1. The Influence of ULTRA in the Second World War.
3. MORRISON, K. ‘A Maudlin and Monstrous Pile’: The Mansion at Bletchley Park, Buckinghamshire by. Transactions of the Ancient Monuments Society 53, (2009).
4. Hinxton Hall history. Hinxton Hall Conference Centre
5. Hinxton Hall (Tube investment limited), Hinxton - 1330969 | Historic England.
6. Soderlund, C. & Dunham, I. SAM: a system for iteratively building marker maps. Comput Appl Biosci 11, 645–655 (1995).
7. Soderlund, C., Longden, I. & Mott, R. FPC: a system for building contigs from restriction fingerprinted clones. Comput Appl Biosci 13, 523–535 (1997).
8. Searls, D.B. Linguistic approaches to biological sequences. Bioinformatics, Volume 13, Issue 4, August 1997, Pages 333–344
9. The Intelligence Factory. Bletchley Park
10. Powledge, T. M. Bear market slashes at human genome. EMBO reports 1, 212–214 (2000).
11. Dunham, I. The gene guessing game. Yeast 17, 218–224 (2000).
12. Bermuda Sequence Policies Archive.
13. Sharing Data to Build a Medical Information Commons: From Bermuda to the Global Alliance. - Abstract - Europe PMC.