MMORE: Massive Multimodal Open RAG & Extraction

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenges of knowledge extraction and retrieval from multimodal heterogeneous documents—including text, tables, images, emails, and audio/video—by proposing a scalable, open-source RAG and information extraction framework. Methodologically, it introduces a modular, distributed architecture enabling CPU/GPU co-execution and parallel processing; integrates multimodal data transformation, unified semantic representation learning, and hybrid dense-sparse retrieval; and incorporates Docling-based optimization with GPU-accelerated computation. Key contributions include: (1) native support for 15+ document formats; (2) provision of both interactive APIs and batch-mode RAG services; (3) a 40% improvement in parsing accuracy for scanned PDFs and a 3.8× speedup over single-node baselines in benchmark evaluations; and (4) significant gains in medical question answering accuracy on PubMedQA as retrieval depth increases—demonstrating its effectiveness and practicality for open-domain multimodal RAG.

Technology Category

Application Category

📝 Abstract
We introduce MMORE, an open-source pipeline for Massive Multimodal Open RetrievalAugmented Generation and Extraction, designed to ingest, transform, and retrieve knowledge from heterogeneous document formats at scale. MMORE supports more than fifteen file types, including text, tables, images, emails, audio, and video, and processes them into a unified format to enable downstream applications for LLMs. The architecture offers modular, distributed processing, enabling scalable parallelization across CPUs and GPUs. On processing benchmarks, MMORE demonstrates a 3.8-fold speedup over single-node baselines and 40% higher accuracy than Docling on scanned PDFs. The pipeline integrates hybrid dense-sparse retrieval and supports both interactive APIs and batch RAG endpoints. Evaluated on PubMedQA, MMORE-augmented medical LLMs improve biomedical QA accuracy with increasing retrieval depth. MMORE provides a robust, extensible foundation for deploying task-agnostic RAG systems on diverse, real-world multimodal data. The codebase is available at https://github.com/swiss-ai/mmore.
Problem

Research questions and friction points this paper is trying to address.

Processes diverse multimodal documents into unified format
Enables scalable retrieval-augmented generation for LLMs
Improves accuracy and speed in document understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified multimodal document processing pipeline
Modular distributed CPU-GPU parallel architecture
Hybrid dense-sparse retrieval integration
🔎 Similar Papers
No similar papers found.
A
Alexandre Sallinen
EPFL, Switzerland
S
Stefan Krsteski
EPFL, Switzerland
P
Paul Teiletche
EPFL, Switzerland
M
Marc-Antoine Allard
EPFL, Switzerland
B
Baptiste Lecoeur
EPFL, Switzerland
M
Michael Zhang
EPFL, Switzerland
D
David Kalajdzic
EPFL, Switzerland
Matthias Meyer
Matthias Meyer
ETHZ, Switzerland
F
Fabrice Nemo
EPFL, Switzerland
Mary-Anne Hartley
Mary-Anne Hartley
EPFL, Harvard-Chan (Ariadne labs), CMU-Africa
Digital Global HealthImplementable AI4HealthMachine learningDistributed Learning