🤖 AI Summary
This study addresses the low efficiency of knowledge integration in microbial protein research for sustainable protein production. We propose a domain-specific multi-agent AI framework comprising a dual-agent collaborative architecture that integrates retrieval-augmented generation (RAG), fine-tuned large language models (LLMs), and prompt engineering to enable automated scientific literature retrieval, key information extraction, and structured knowledge synthesis. Our contributions include: (1) the first synergistic application of RAG and LLM fine-tuning for knowledge mining in the microbial protein domain; (2) a scalable chemical safety retrieval module; and (3) a lightweight user interface supporting practical deployment. Experiments demonstrate an average cosine similarity of 0.94 for extracted information—5.6% higher than baseline methods—with robust system performance. The framework is open-sourced and empirically validated for domain adaptability and real-world utility.
📝 Abstract
The global demand for sustainable protein sources has accelerated the need for intelligent tools that can rapidly process and synthesise domain-specific scientific knowledge. In this study, we present a proof-of-concept multi-agent Artificial Intelligence (AI) framework designed to support sustainable protein production research, with an initial focus on microbial protein sources. Our Retrieval-Augmented Generation (RAG)-oriented system consists of two GPT-based LLM agents: (1) a literature search agent that retrieves relevant scientific literature on microbial protein production for a specified microbial strain, and (2) an information extraction agent that processes the retrieved content to extract relevant biological and chemical information. Two parallel methodologies, fine-tuning and prompt engineering, were explored for agent optimisation. Both methods demonstrated effectiveness at improving the performance of the information extraction agent in terms of transformer-based cosine similarity scores between obtained and ideal outputs. Mean cosine similarity scores were increased by up to 25%, while universally reaching mean scores of $geq 0.89$ against ideal output text. Fine-tuning overall improved the mean scores to a greater extent (consistently of $geq 0.94$) compared to prompt engineering, although lower statistical uncertainties were observed with the latter approach. A user interface was developed and published for enabling the use of the multi-agent AI system, alongside preliminary exploration of additional chemical safety-based search capabilities