🤖 AI Summary
This study addresses the challenge of modeling protein–nucleic acid interactions, specifically predicting binding free energy (ΔG) and identifying key binding residues. We introduce the first open-source multimodal sequence Transformer foundation model capable of joint self-supervised learning over DNA/RNA and protein sequences—without requiring structural labels—thereby implicitly capturing central dogma constraints and molecular interaction principles. Our method innovates with cross-modal embedding alignment and multimodal joint pretraining. It achieves state-of-the-art performance on both ΔG prediction and binding residue localization, surpassing unimodal baselines in both computational efficiency per unit hardware and absolute accuracy. Furthermore, the model yields biologically interpretable patterns, establishing a new paradigm for cross-molecular-type functional prediction. (128 words)
📝 Abstract
The transformer architecture has revolutionized bioinformatics and driven progress in the understanding and prediction of the properties of biomolecules. Almost all research on large-scale biosequence transformers has focused on one domain at a time (single-omic), usually DNA/RNA or proteins. These models have seen incredible success in downstream tasks in each domain, and have achieved particularly noteworthy breakthroughs in sequence modeling and structural modeling. However, these single-omic models are naturally incapable of efficiently modeling multi-omic tasks, one of the most biologically critical being protein-nucleic acid interactions. We present our work training the largest open-source multi-omic foundation model to date. We show that these multi-omic models (MOMs) can learn joint representations between various single-omic distributions that are emergently consistent with the Central Dogma of molecular biology despite only being trained on unlabeled biosequences. We further demonstrate that MOMs can be fine-tuned to achieve state-of-the-art results on protein-nucleic acid interaction tasks, namely predicting the change in Gibbs free energy ($Delta G$) of the binding interaction between a given nucleic acid and protein. Remarkably, we show that multi-omic biosequence transformers emergently learn useful structural information without any extit{a priori} structural training, allowing us to predict which protein residues are most involved in the protein-nucleic acid binding interaction. Lastly, we provide evidence that multi-omic biosequence models are in many cases superior to foundation models trained on single-omics distributions, both in performance-per-FLOP and absolute performance, suggesting a more generalized or foundational approach to building these models for biology.