BioMol-MQA: A Multi-Modal Question Answering Dataset For LLM Reasoning Over Bio-Molecular Interactions

๐Ÿ“… 2025-06-06
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing RAG methods predominantly support unimodal (text-only) retrieval, failing to address the multimodal information landscape prevalent in biomedicineโ€”e.g., knowledge graphs (KGs), clinical notes, and 3D molecular structures. Method: We introduce BioMM-QA, the first multimodal question-answering dataset tailored to polypharmacy scenarios. It pioneers the integration of 3D molecular geometry into KGs, establishing a unified multimodal knowledge base comprising KGs, structured clinical notes, and geometric molecular representations. We design QA tasks requiring cross-modal reasoning and propose a standardized RAG evaluation framework. Contribution/Results: Experiments reveal that state-of-the-art LLMs achieve <35% average accuracy on BioMM-QA; even with full contextual grounding, performance gains remain marginal. This starkly exposes critical bottlenecks in current multimodal RAG systems and fills a key gap in evaluating multimodal reasoning for biomedical applications.

Technology Category

Application Category

๐Ÿ“ Abstract
Retrieval augmented generation (RAG) has shown great power in improving Large Language Models (LLMs). However, most existing RAG-based LLMs are dedicated to retrieving single modality information, mainly text; while for many real-world problems, such as healthcare, information relevant to queries can manifest in various modalities such as knowledge graph, text (clinical notes), and complex molecular structure. Thus, being able to retrieve relevant multi-modality domain-specific information, and reason and synthesize diverse knowledge to generate an accurate response is important. To address the gap, we present BioMol-MQA, a new question-answering (QA) dataset on polypharmacy, which is composed of two parts (i) a multimodal knowledge graph (KG) with text and molecular structure for information retrieval; and (ii) challenging questions that designed to test LLM capabilities in retrieving and reasoning over multimodal KG to answer questions. Our benchmarks indicate that existing LLMs struggle to answer these questions and do well only when given the necessary background data, signaling the necessity for strong RAG frameworks.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM reasoning with multi-modal retrieval for bio-molecular interactions
Addressing gaps in RAG frameworks for healthcare and molecular data
Developing a QA dataset to test LLM capabilities in multi-modal reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal knowledge graph for retrieval
Retrieval augmented generation with diverse data
Integration of molecular structure and text
๐Ÿ”Ž Similar Papers
No similar papers found.