🤖 AI Summary
Genomic question-answering systems face challenges of poor scalability, high privacy risks, and limited generalization due to reliance on closed-source models. To address these issues, we propose OpenBioLLM—a modular multi-agent framework built upon open-source large language models (e.g., Llama 3.1, Qwen2.5). It decouples three specialized agents—tool routing, query generation, and response verification—enabling role-based collaboration and chain-of-thought reasoning without model fine-tuning, thus supporting diverse open-weight backbones. Evaluated on Gene-Turing and GeneHop benchmarks, OpenBioLLM achieves average scores of 0.849 and 0.830, respectively—comparable to or exceeding GeneGPT (which depends on proprietary APIs and Code-davinci-002), while reducing inference latency by 40–50%. The framework significantly enhances privacy preservation, system scalability, and efficiency in integrating biomedical knowledge.
📝 Abstract
Genomic question answering often requires complex reasoning and integration across diverse biomedical sources. GeneGPT addressed this challenge by combining domain-specific APIs with OpenAI's code-davinci-002 large language model to enable natural language interaction with genomic databases. However, its reliance on a proprietary model limits scalability, increases operational costs, and raises concerns about data privacy and generalization. In this work, we revisit and reproduce GeneGPT in a pilot study using open source models, including Llama 3.1, Qwen2.5, and Qwen2.5 Coder, within a monolithic architecture; this allows us to identify the limitations of this approach. Building on this foundation, we then develop OpenBioLLM, a modular multi-agent framework that extends GeneGPT by introducing agent specialization for tool routing, query generation, and response validation. This enables coordinated reasoning and role-based task execution. OpenBioLLM matches or outperforms GeneGPT on over 90% of the benchmark tasks, achieving average scores of 0.849 on Gene-Turing and 0.830 on GeneHop, while using smaller open-source models without additional fine-tuning or tool-specific pretraining. OpenBioLLM's modular multi-agent design reduces latency by 40-50% across benchmark tasks, significantly improving efficiency without compromising model capability. The results of our comprehensive evaluation highlight the potential of open-source multi-agent systems for genomic question answering. Code and resources are available at https://github.com/ielab/OpenBioLLM.