Last summer, two students from the Massachusetts Institute of Technology (MIT) joined Open Targets for a summer internship through MIT's international science and technology initiatives (MISTI) programme.
Babatomiwa Adebiyi is now a final year undergraduate student in computer science. We recently met virtually, and he told me about his project and his experience at Open Targets.
What was it like to intern at Open Targets remotely?
Everyone in the team was welcoming, and I knew that if I had any issues then I could ask anyone. Ellie McDonagh and David Ochoa were fantastic supervisors; they helped me work through and understand the problems I was trying to solve. But to be honest, since I was working from home and a lot of the work was individual, it sometimes felt more like a school project than an internship.
The end of my internship was even more challenging: I received a grant through the Experiential Learning Opportunity fund (ELO) to continue working on my project for an extra few months, but it meant that I was juggling my university courses with this internship, and I was based in the USA so I was working remotely on a completely different schedule.
What were you working on during your internship?
The leading reason for the failure of late stage drug trials is lack of efficacy, so in my project I developed a machine learning strategy using validated drug-target pairs to help prioritise drug targets. While this has been attempted before, my angle was to try to understand whether machine learning was able to distinguish the different characteristics that define a good target for different therapeutic areas.
For example, whereas a relevant target for cancer might have a very distinctive pattern of somatic mutations, that same pattern is usually absent or uninformative when prioritising targets for rare diseases. Alternatively, targets for complex or common diseases are sometimes successfully pinpointed using GWAS in combination with other functional genomics approaches. My objective was to identify all the non-intuitive correlations between the data sources, and at the same time be able to process large amounts of information from the Open Targets Platform.
My project faced some of the common challenges of dealing with biological data. Data is usually sparse and very complex, and gold standards are difficult to define. Despite such challenges, I was able to create several machine learning models (random forest, gbm and SVM) to predict targets in the three main Open Targets therapeutic areas: cancer, immune diseases, and neurological diseases. The interpretation of the feature relevance helped to identify the relative importance of the different data sources for the different therapeutic areas. Hopefully, my findings will help guide the future prioritisations offered by the Open Targets Platform and provide a more flexible strategy for different therapy areas.
What interested you about this project?
I was excited to work on this project so that I could apply the machine learning theory I’d learnt in class to a real-world problem, with data that matter. I also wanted to see how accurate these models could be when they were based on messy data.
In our classes, the data is always cleanly labelled whereas, due to the nature of this project, I had to work with mislabelled data. For example, just because there isn’t a drug for a particular target-disease pair doesn’t mean it isn’t genetically validated; there could have been issues with the clinical trials such as safety concerns or simply lack of patients.
During my internship, David Ochoa also helped me to learn how to code with PySpark, and it makes a huge difference to write code that works and runs efficiently.
What’s next for you?
I’m currently taking quite a few machine learning classes, and I’ve been able to apply what I learnt at Open Targets. This is my last semester of university, after which I will be looking for a hands-on job in machine learning or software engineering. I don’t have a background in biology, but I am definitely more open to working in a biological domain than I was prior to this internship.
Cover illustration by Freepik Storyset