MoMA: A Mixture-of-Multimodal-Agents Architecture for Enhancing Clinical Prediction Modelling

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing challenges in multimodal electronic health record (EHR) integration—including heterogeneous data fusion, high data dependency, and limited interpretability—this paper proposes a hierarchical multi-agent collaborative framework comprising expert, aggregation, and prediction agents. The framework enables controllable translation of non-textual modalities (e.g., medical images and laboratory values) into structured clinical text summaries and supports unified multimodal reasoning. Centered on large language models (LLMs), the method orchestrates heterogeneous EHR data without requiring large-scale annotated datasets, thereby enhancing generalization. Evaluated on three real-world clinical prediction tasks—sepsis early warning, length-of-stay estimation, and in-hospital mortality risk assessment—the approach consistently outperforms state-of-the-art methods, achieving an average AUC improvement of 3.2%. It further demonstrates strong cross-task adaptability and clinically meaningful interpretability.

Technology Category

Application Category

📝 Abstract
Multimodal electronic health record (EHR) data provide richer, complementary insights into patient health compared to single-modality data. However, effectively integrating diverse data modalities for clinical prediction modeling remains challenging due to the substantial data requirements. We introduce a novel architecture, Mixture-of-Multimodal-Agents (MoMA), designed to leverage multiple large language model (LLM) agents for clinical prediction tasks using multimodal EHR data. MoMA employs specialized LLM agents ("specialist agents") to convert non-textual modalities, such as medical images and laboratory results, into structured textual summaries. These summaries, together with clinical notes, are combined by another LLM ("aggregator agent") to generate a unified multimodal summary, which is then used by a third LLM ("predictor agent") to produce clinical predictions. Evaluating MoMA on three prediction tasks using real-world datasets with different modality combinations and prediction settings, MoMA outperforms current state-of-the-art methods, highlighting its enhanced accuracy and flexibility across various tasks.
Problem

Research questions and friction points this paper is trying to address.

Integrating multimodal EHR data for clinical predictions
Overcoming data requirements in clinical prediction modeling
Enhancing accuracy with specialized LLM agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

Specialist agents convert non-text data to text
Aggregator agent combines multimodal summaries
Predictor agent generates clinical predictions
🔎 Similar Papers
No similar papers found.
J
Jifan Gao
University of Wisconsin-Madison, Madison, Wisconsin, USA
M
Mahmudur Rahman
University of Wisconsin-Madison, Madison, Wisconsin, USA
J
John Caskey
University of Wisconsin-Madison, Madison, Wisconsin, USA
M
Madeline Oguss
University of Wisconsin-Madison, Madison, Wisconsin, USA
A
Ann O'Rourke
University of Wisconsin-Madison, Madison, Wisconsin, USA
R
Randy Brown
University of Wisconsin-Madison, Madison, Wisconsin, USA
A
Anne Stey
Northwestern University, Chicago, Illinois, USA
Anoop Mayampurath
Anoop Mayampurath
Assistant Professor, University of Wisconsin-Madison
Machine LearningStatisticsHealthcareBioinformaticsUW-Informatics
M
Matthew M. Churpek
University of Wisconsin-Madison, Madison, Wisconsin, USA
G
Guanhua Chen
University of Wisconsin-Madison, Madison, Wisconsin, USA
Majid Afshar
Majid Afshar
University of Wisconsin - Madison
Natural Language ProcessingArtificial IntelligenceMed InformaticsCritical CareUW-Informatics