🤖 AI Summary
This work addresses multi-omics representation learning by proposing a lightweight, fine-tuning-free cross-modal fusion framework that integrates pre-trained DNA, mRNA, and protein language models into unified molecular representations. Methodologically, it introduces (i) the first codon-level embedding alignment mechanism grounded in the central dogma, enabling structured semantic alignment across the three modalities; and (ii) three plug-and-play fusion strategies—embedding concatenation, entropy-regularized attention pooling (inspired by multiple-instance learning), and cross-modal multi-head attention—while preserving all pre-trained model parameters intact. Evaluated on five molecular property prediction tasks, the framework consistently outperforms unimodal baselines, demonstrating that complementary information across pre-trained omics models can be efficiently captured through simple, parameter-preserving fusion. The results validate both the biological plausibility of codon-level alignment and the practical efficacy of modular cross-modal integration in multi-omics representation learning.
📝 Abstract
We present BioLangFusion, a simple approach for integrating pre-trained DNA, mRNA, and protein language models into unified molecular representations. Motivated by the central dogma of molecular biology (information flow from gene to transcript to protein), we align per-modality embeddings at the biologically meaningful codon level (three nucleotides encoding one amino acid) to ensure direct cross-modal correspondence. BioLangFusion studies three standard fusion techniques: (i) codon-level embedding concatenation, (ii) entropy-regularized attention pooling inspired by multiple-instance learning, and (iii) cross-modal multi-head attention -- each technique providing a different inductive bias for combining modality-specific signals. These methods require no additional pre-training or modification of the base models, allowing straightforward integration with existing sequence-based foundation models. Across five molecular property prediction tasks, BioLangFusion outperforms strong unimodal baselines, showing that even simple fusion of pre-trained models can capture complementary multi-omic information with minimal overhead.