CollEX -- A Multimodal Agentic RAG System Enabling Interactive Exploration of Scientific Collections

๐Ÿ“… 2025-04-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the limited intuitiveness and interactivity of existing scientific literature retrieval systems, this paper proposes a multimodal agent-enhanced Retrieval-Augmented Generation (RAG) framework. The method innovatively constructs Large Vision-Language Models (LVLMs) as collaborative, tool-augmented multimodal agents capable of natural-language-driven cross-modal exploration. It integrates multi-agent coordination, multimodal retrieval, and fused multimodal understanding to enable curiosity-guided interdisciplinary association discovery and complementary text-image analysis. Empirical evaluation across 32 institutional repositories comprising over 64,000 records demonstrates significant improvements in user exploratory efficiency and scientific engagement. The framework proves effective in both science communication and research assistance scenarios, validating its dual utility for education and scholarly support.

Technology Category

Application Category

๐Ÿ“ Abstract
In this paper, we introduce CollEx, an innovative multimodal agentic Retrieval-Augmented Generation (RAG) system designed to enhance interactive exploration of extensive scientific collections. Given the overwhelming volume and inherent complexity of scientific collections, conventional search systems often lack necessary intuitiveness and interactivity, presenting substantial barriers for learners, educators, and researchers. CollEx addresses these limitations by employing state-of-the-art Large Vision-Language Models (LVLMs) as multimodal agents accessible through an intuitive chat interface. By abstracting complex interactions via specialized agents equipped with advanced tools, CollEx facilitates curiosity-driven exploration, significantly simplifying access to diverse scientific collections and records therein. Our system integrates textual and visual modalities, supporting educational scenarios that are helpful for teachers, pupils, students, and researchers by fostering independent exploration as well as scientific excitement and curiosity. Furthermore, CollEx serves the research community by discovering interdisciplinary connections and complementing visual data. We illustrate the effectiveness of our system through a proof-of-concept application containing over 64,000 unique records across 32 collections from a local scientific collection from a public university.
Problem

Research questions and friction points this paper is trying to address.

Enhances interactive exploration of large scientific collections
Addresses lack of intuitiveness in conventional search systems
Integrates textual and visual modalities for educational use
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal agentic RAG system for scientific exploration
Uses Large Vision-Language Models (LVLMs) as agents
Integrates textual and visual modalities for education
๐Ÿ”Ž Similar Papers
No similar papers found.
Florian Schneider
Florian Schneider
Ph.D. Student Universitรคt Hamburg
Vision-Language ModelsMultilingual VLMsMulticultural VLMsCross-Modal Information Retrieval
N
Narges Baba Ahmadi
Hub of Computing and Data Science, University of Hamburg, Germany
N
Niloufar Baba Ahmadi
Hub of Computing and Data Science, University of Hamburg, Germany
I
Iris Vogel
Center for Sustainable Research Data Management, University of Hamburg, Germany
Martin Semmann
Martin Semmann
Hub of Computing and Data Science, University of Hamburg
Information SystemsService ScienceIT ManagementApplied AI
C
Christian Biemann
Hub of Computing and Data Science, University of Hamburg, Germany