🤖 AI Summary
Medical question-answering systems face critical challenges including hallucination, bias, high computational overhead, privacy risks, and difficulty integrating cross-specialty knowledge. To address these, we propose a multi-agent architecture tailored for complex medical QA. Our approach features: (1) a dynamic routing mechanism wherein a Router Agent selectively dispatches queries to ten lightweight (1B-parameter) domain-specific fine-tuned models; (2) an Orchestrator Agent that synthesizes coherent, consensus-driven answers from multiple specialist agents; and (3) support for low-resource, on-device deployment to enhance privacy and efficiency. Evaluated on the Italian Medical Forum dataset, our system achieves ROUGE-1 = 0.301 and BERTScore F1 = 0.697—outperforming monolithic baselines up to 14B parameters—while significantly mitigating hallucination and bias. The framework strikes a robust balance among computational efficiency, clinical accuracy, and data privacy.
📝 Abstract
Medical question answering systems face deployment challenges including hallucinations, bias, computational demands, privacy concerns, and the need for specialized expertise across diverse domains. Here, we present SOLVE-Med, a multi-agent architecture combining domain-specialized small language models for complex medical queries. The system employs a Router Agent for dynamic specialist selection, ten specialized models (1B parameters each) fine-tuned on specific medical domains, and an Orchestrator Agent that synthesizes responses. Evaluated on Italian medical forum data across ten specialties, SOLVE-Med achieves superior performance with ROUGE-1 of 0.301 and BERTScore F1 of 0.697, outperforming standalone models up to 14B parameters while enabling local deployment. Our code is publicly available on GitHub: https://github.com/PRAISELab-PicusLab/SOLVE-Med.