CrossLLM-Mamba: Multimodal State Space Fusion of LLMs for RNA Interaction Prediction

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing RNA interaction prediction methods, which rely on static fusion strategies and fail to capture the dynamic, context-dependent nature of binding processes. To overcome this, we introduce a novel dynamic alignment framework that, for the first time, integrates state space models into multimodal biological large language model (BioLLM) fusion. Our approach employs a bidirectional Mamba encoder to enable context-aware interactions between RNA and protein embeddings through a sequence state transition mechanism. Leveraging BioLLM representations from ESM-2 and RiNALMo, augmented with Gaussian noise injection and optimized with Focal Loss, the method achieves linear computational complexity while significantly enhancing performance: it attains an MCC of 0.892 on RPI1460, surpassing the current state-of-the-art by 5.2%, and achieves Pearson correlation coefficients exceeding 0.95 in both riboswitch and repetitive RNA affinity prediction tasks.

Technology Category

Application Category

📝 Abstract
Accurate prediction of RNA-associated interactions is essential for understanding cellular regulation and advancing drug discovery. While Biological Large Language Models (BioLLMs) such as ESM-2 and RiNALMo provide powerful sequence representations, existing methods rely on static fusion strategies that fail to capture the dynamic, context-dependent nature of molecular binding. We introduce CrossLLM-Mamba, a novel framework that reformulates interaction prediction as a state-space alignment problem. By leveraging bidirectional Mamba encoders, our approach enables deep ``crosstalk'' between modality-specific embeddings through hidden state propagation, modeling interactions as dynamic sequence transitions rather than static feature overlaps. The framework maintains linear computational complexity, making it scalable to high-dimensional BioLLM embeddings. We further incorporate Gaussian noise injection and Focal Loss to enhance robustness against hard-negative samples. Comprehensive experiments across three interaction categories, RNA-protein, RNA-small molecule, and RNA-RNA demonstrate that CrossLLM-Mamba achieves state-of-the-art performance. On the RPI1460 benchmark, our model attains an MCC of 0.892, surpassing the previous best by 5.2\%. For binding affinity prediction, we achieve Pearson correlations exceeding 0.95 on riboswitch and repeat RNA subtypes. These results establish state-space modeling as a powerful paradigm for multi-modal biological interaction prediction.
Problem

Research questions and friction points this paper is trying to address.

RNA interaction prediction
dynamic context
static fusion
molecular binding
multimodal representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

State Space Modeling
Mamba Encoder
Multimodal Fusion
Dynamic Interaction Prediction
BioLLM