Harnessing Collective Intelligence of LLMs for Robust Biomedical QA: A Multi-Model Approach

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the declining robustness of question-answering (QA) systems amid the exponential growth of biomedical literature, this paper proposes a multi-LLM collective intelligence fusion framework. The method leverages 13 open-source large language models integrated with retrieval-augmented generation (RAG) and task-adaptive ensemble strategies: majority voting for yes/no questions and answer merging with deduplication for list-type questions. Furthermore, it introduces a question-type–specific model pipeline to optimize model specialization and synergistic effects. Evaluated on the 2025 BioASQ challenge’s collaborative task, the framework achieves first place in ideal answers, second place in exact answers, and jointly ranks first in exact answers twice. These results empirically validate the efficacy and state-of-the-art performance of heterogeneous multi-model ensembling for biomedical QA.

Technology Category

Application Category

📝 Abstract
Biomedical text mining and question-answering are essential yet highly demanding tasks, particularly in the face of the exponential growth of biomedical literature. In this work, we present our participation in the 13th edition of the BioASQ challenge, which involves biomedical semantic question-answering for Task 13b and biomedical question-answering for developing topics for the Synergy task. We deploy a selection of open-source large language models (LLMs) as retrieval-augmented generators to answer biomedical questions. Various models are used to process the questions. A majority voting system combines their output to determine the final answer for Yes/No questions, while for list and factoid type questions, the union of their answers in used. We evaluated 13 state-of-the-art open source LLMs, exploring all possible model combinations to contribute to the final answer, resulting in tailored LLM pipelines for each question type. Our findings provide valuable insight into which combinations of LLMs consistently produce superior results for specific question types. In the four rounds of the 2025 BioASQ challenge, our system achieved notable results: in the Synergy task, we secured 1st place for ideal answers and 2nd place for exact answers in round 2, as well as two shared 1st places for exact answers in round 3 and 4.
Problem

Research questions and friction points this paper is trying to address.

Improving biomedical QA accuracy using multiple LLMs
Developing tailored LLM pipelines for different question types
Enhancing robustness via majority voting and answer union
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes multiple open-source LLMs for biomedical QA
Implements majority voting for Yes/No questions
Combines answers via union for list/factoid questions
🔎 Similar Papers
No similar papers found.
D
Dimitra Panou
Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, Greece Institute for Fundamental Biomedical Science, Biomedical Sciences Research Center "Alexander Fleming", Greece
A
Alexandros C. Dimopoulos
Institute for Fundamental Biomedical Science, Biomedical Sciences Research Center "Alexander Fleming", Greece Department of Informatics & Telematics, School of Digital Technology, Harokopio University, Greece
Manolis Koubarakis
Manolis Koubarakis
Professor, Dept. of Informatics and Telecommunications, National and Kapodistrian University of
Artificial IntelligenceSemantic Web and Linked DataBig DataEarth ObservationMachine Learning
M
Martin Reczko
Institute for Fundamental Biomedical Science, Biomedical Sciences Research Center "Alexander Fleming", Greece