BioLangFusion: Multimodal Fusion of DNA, mRNA, and Protein Language Models

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses multi-omics representation learning by proposing a lightweight, fine-tuning-free cross-modal fusion framework that integrates pre-trained DNA, mRNA, and protein language models into unified molecular representations. Methodologically, it introduces (i) the first codon-level embedding alignment mechanism grounded in the central dogma, enabling structured semantic alignment across the three modalities; and (ii) three plug-and-play fusion strategies—embedding concatenation, entropy-regularized attention pooling (inspired by multiple-instance learning), and cross-modal multi-head attention—while preserving all pre-trained model parameters intact. Evaluated on five molecular property prediction tasks, the framework consistently outperforms unimodal baselines, demonstrating that complementary information across pre-trained omics models can be efficiently captured through simple, parameter-preserving fusion. The results validate both the biological plausibility of codon-level alignment and the practical efficacy of modular cross-modal integration in multi-omics representation learning.

Technology Category

Application Category

📝 Abstract

We present BioLangFusion, a simple approach for integrating pre-trained DNA, mRNA, and protein language models into unified molecular representations. Motivated by the central dogma of molecular biology (information flow from gene to transcript to protein), we align per-modality embeddings at the biologically meaningful codon level (three nucleotides encoding one amino acid) to ensure direct cross-modal correspondence. BioLangFusion studies three standard fusion techniques: (i) codon-level embedding concatenation, (ii) entropy-regularized attention pooling inspired by multiple-instance learning, and (iii) cross-modal multi-head attention -- each technique providing a different inductive bias for combining modality-specific signals. These methods require no additional pre-training or modification of the base models, allowing straightforward integration with existing sequence-based foundation models. Across five molecular property prediction tasks, BioLangFusion outperforms strong unimodal baselines, showing that even simple fusion of pre-trained models can capture complementary multi-omic information with minimal overhead.

Problem

Research questions and friction points this paper is trying to address.

Integrating DNA mRNA protein models into unified representations

Aligning embeddings at codon level for cross-modal correspondence

Improving molecular property prediction via multimodal fusion techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns DNA mRNA protein embeddings codon-level

Uses three fusion techniques no pre-training

Outperforms unimodal baselines five prediction tasks

🔎 Similar Papers

Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey