Abstract: TH-PO031
StarFunc: Accurate Protein Function Prediction Reveals Novel Human Proteins Involved in Ubiquitination
Session Information
- Augmented Intelligence for Prediction and Image Analysis
October 24, 2024 | Location: Exhibit Hall, Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Augmented Intelligence, Digital Health, and Data Science
- 300 Augmented Intelligence, Digital Health, and Data Science
Authors
- Zhang, Chengxin, University of Michigan, Ann Arbor, Michigan, United States
- Freddolino, Lydia, University of Michigan, Ann Arbor, Michigan, United States
Background
Even in the very well-studied human proteome, many proteins remain poorly annotated, yet may still make important contributions to health and disease. Deep learning has significantly advanced the development of novel methods for protein function prediction. Yet, even for state-of-the-art deep learning approaches, template information remains an indispensable component. While many prediction methods use templates identified through sequence homology or protein-protein interactions, very few methods detect templates through structural similarity, even though protein structures are the basis of their functions.
Methods
In this work, we developed StarFunc, a composite approach that integrates state-of-the-art deep learning models seamlessly with template information from structural similarity, sequence homology, protein-protein interaction partners, and protein domain families (Fig. 1).
Results
We compared the accuracy of StarFunc against 6 existing deep learning methods and 3 template-based methods on 2475 proteins. The weighted F-measure of StarFunc is 12% higher than the second-best approach. StarFunc participated in the Critical Assessment of Function Annotation 5 (CAFA5) challenge and was ranked 5th among 1625 teams from 96 countries. We applied StarFunc on all 20389 proteins from the human reference proteome curated by the neXtProt and identified significant enrichments of several important functions among the set of currently cryptic human proteins. For example, we discovered 15 uncharacterized proteins that are likely components of protein-ubiquitin transferase complexes and 1 putative deubiquitinase.
Conclusion
Large-scale benchmark demonstrates StarFunc's advantage when compared to both deep learning methods and conventional template-based predictors. Application of StarFunc on human proteome reveals novel functions of previously uncharacterized proteins, especially those involved in (de)ubiquitination, providing an entry point for studying fundamental new biology involving those proteins.
Fig. 1. Flowchart of StarFunc.
Funding
- Other NIH Support