🤖 AI Summary
To address the declining robustness of question-answering (QA) systems amid the exponential growth of biomedical literature, this paper proposes a multi-LLM collective intelligence fusion framework. The method leverages 13 open-source large language models integrated with retrieval-augmented generation (RAG) and task-adaptive ensemble strategies: majority voting for yes/no questions and answer merging with deduplication for list-type questions. Furthermore, it introduces a question-type–specific model pipeline to optimize model specialization and synergistic effects. Evaluated on the 2025 BioASQ challenge’s collaborative task, the framework achieves first place in ideal answers, second place in exact answers, and jointly ranks first in exact answers twice. These results empirically validate the efficacy and state-of-the-art performance of heterogeneous multi-model ensembling for biomedical QA.
📝 Abstract
Biomedical text mining and question-answering are essential yet highly demanding tasks, particularly in the face of the exponential growth of biomedical literature. In this work, we present our participation in the 13th edition of the BioASQ challenge, which involves biomedical semantic question-answering for Task 13b and biomedical question-answering for developing topics for the Synergy task. We deploy a selection of open-source large language models (LLMs) as retrieval-augmented generators to answer biomedical questions. Various models are used to process the questions. A majority voting system combines their output to determine the final answer for Yes/No questions, while for list and factoid type questions, the union of their answers in used. We evaluated 13 state-of-the-art open source LLMs, exploring all possible model combinations to contribute to the final answer, resulting in tailored LLM pipelines for each question type. Our findings provide valuable insight into which combinations of LLMs consistently produce superior results for specific question types. In the four rounds of the 2025 BioASQ challenge, our system achieved notable results: in the Synergy task, we secured 1st place for ideal answers and 2nd place for exact answers in round 2, as well as two shared 1st places for exact answers in round 3 and 4.