NSF Grant Funds Protein Research for Drug Discovery and Personalized Medicine
Proteins, including antibodies, hemoglobin, and insulin, power nearly every vital aspect of life. Breakthroughs in protein research are producing vaccines, resilient crops, bioenergy sources, and other innovative technologies.
Despite their importance, most of what scientists know about proteins only comes from a small sample size. This stands in the way of fully understanding how most proteins work and unlocking their full potential.
Georgia Tech’s Yunan Luo believes artificial intelligence (AI) could fill this knowledge gap. The National Science Foundation agrees. Luo is the recipient of an NSF Faculty Early Career Development (CAREER) award.
“So much of biology depends on knowing what proteins do, but decades of research have concentrated on a relatively small set of well-studied proteins. This imbalance in scientific attention leads to a distorted view of the biological landscape that quietly shapes our data and our algorithms,” Luo said.
“My group’s goal is to build machine learning (ML) models that actively close this gap by generating trustworthy function predictions for the many proteins that remain understudied.”
[Related: Yunan Luo to use AI for Protein Design and Discovery with Support of $1.8 Million NIH Grant]
In his proposal to NSF, Luo coined this rich-get-richer effect “annotation inequality.”
One problem of annotation inequality is that it slows progress in disease prognosis, drug discovery, and other critical biomedical areas. It is challenging to innovate the few proteins that scientists already know so much about.
A cascading effect of annotation inequality is that it diminishes the effectiveness of studying proteins with AI.
AI methods learn from existing experimental data. Datasets skewed toward well-known proteins propagate and become entrenched in models. Over time, this makes it harder for computers to research understudied proteins.
“Protein annotation inequality creates an effect analogous to a vast library where 95% of patrons only read the top 5% popular books, leaving the rest of the collection to gather dust,” Luo said.
“This has resulted in knowledge disparities across proteins in current literature and databases, biasing our understanding of protein functions.”
The NSF CAREER award will fund Luo with over $770,000 for the next five years to tackle head-on the problem of protein annotation inequality.
Luo will use the grant to build an accurate, unbiased protein function prediction framework at scale. His project aims to:
- Reveal how annotation inequality affects protein function prediction systems
- Create ML techniques suited for biological data, which is often noisy, incomplete, and imbalanced
- Integrate data and ML models into a scalable framework to accelerate discoveries involving understudied proteins
More enduring than the ML framework, Luo will leverage the NSF award to support educational and outreach programs. His goal is to groom the next generation of researchers to study other challenges in computational biology, not just the annotation inequality problem.
Luo teaches graduate and undergraduate courses focused on computational biology and ML. Problems and methods developed through the CAREER project can be used as course material in his classes.
Luo also championed collaboration with Georgia Tech’s Center for Education Integrating Science, Mathematics, and Computing (CEISMC) in his proposal.
Through this partnership, local high school teachers and students would gain access to his data and models. This promotes deeper learning of biology and data science through hands-on experience with real-world tools.
Luo sees reaching students and the community as a way of paying forward the support he received from Georgia Tech colleagues.
“I am incredibly grateful for this recognition from the NSF,” said Luo, an assistant professor in the School of Computational Science and Engineering (CSE).
“This would not have been possible without my students and collaborators, whose hard work laid the groundwork for this proposal.”
Luo praised CSE faculty members B. Aditya Prakash, Xiuwei Zhang, and Chao Zhang for their guidance. All three study machine learning and computational bioscience, two of CSE’s five core research areas.
Luo also thanked Haesun Park for her support and recommendation for the CAREER award. Park is a Regents’ Professor and the chair of the School of CSE.