Harnessing Collective Intelligence of LLMs for Robust Biomedical QA: A Multi-Model Approach

📅 2025-08-02

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address the declining robustness of question-answering (QA) systems amid the exponential growth of biomedical literature, this paper proposes a multi-LLM collective intelligence fusion framework. The method leverages 13 open-source large language models integrated with retrieval-augmented generation (RAG) and task-adaptive ensemble strategies: majority voting for yes/no questions and answer merging with deduplication for list-type questions. Furthermore, it introduces a question-type–specific model pipeline to optimize model specialization and synergistic effects. Evaluated on the 2025 BioASQ challenge’s collaborative task, the framework achieves first place in ideal answers, second place in exact answers, and jointly ranks first in exact answers twice. These results empirically validate the efficacy and state-of-the-art performance of heterogeneous multi-model ensembling for biomedical QA.

Technology Category

Application Category

📝 Abstract

Biomedical text mining and question-answering are essential yet highly demanding tasks, particularly in the face of the exponential growth of biomedical literature. In this work, we present our participation in the 13th edition of the BioASQ challenge, which involves biomedical semantic question-answering for Task 13b and biomedical question-answering for developing topics for the Synergy task. We deploy a selection of open-source large language models (LLMs) as retrieval-augmented generators to answer biomedical questions. Various models are used to process the questions. A majority voting system combines their output to determine the final answer for Yes/No questions, while for list and factoid type questions, the union of their answers in used. We evaluated 13 state-of-the-art open source LLMs, exploring all possible model combinations to contribute to the final answer, resulting in tailored LLM pipelines for each question type. Our findings provide valuable insight into which combinations of LLMs consistently produce superior results for specific question types. In the four rounds of the 2025 BioASQ challenge, our system achieved notable results: in the Synergy task, we secured 1st place for ideal answers and 2nd place for exact answers in round 2, as well as two shared 1st places for exact answers in round 3 and 4.

Problem

Research questions and friction points this paper is trying to address.

Improving biomedical QA accuracy using multiple LLMs

Developing tailored LLM pipelines for different question types

Enhancing robustness via majority voting and answer union

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes multiple open-source LLMs for biomedical QA

Implements majority voting for Yes/No questions

Combines answers via union for list/factoid questions

🔎 Similar Papers

No similar papers found.