🤖 AI Summary
Current large language models (LLMs) exhibit limited domain expertise, weak cross-source knowledge integration, and difficulty identifying research gaps in photosynthesis research. To address these challenges, we propose PRAG, a plant science–specific AI assistant featuring the first photosynthesis-dedicated retrieval-augmented generation (RAG) framework. PRAG integrates cross-literature knowledge graph alignment, automated feedback-driven prompt optimization, and source-aware RAG. Built upon GPT-4o, it combines vector-based retrieval, dynamic prompt tuning, and a closed-loop feedback mechanism. Experimental results demonstrate an average 8.7% improvement across five scientific writing metrics, a 25.4% gain in source transparency, and knowledge graph entity matching rates of 63% (against databases) and 39.5% (against empirical papers). Moreover, PRAG achieves scientific depth comparable to peer-reviewed domain literature. This work substantially enhances LLM accuracy, interpretability, and domain adaptability for complex bioscience tasks.
📝 Abstract
The development of biological data analysis tools and large language models (LLMs) has opened up new possibilities for utilizing AI in plant science research, with the potential to contribute significantly to knowledge integration and research gap identification. Nonetheless, current LLMs struggle to handle complex biological data and theoretical models in photosynthesis research and often fail to provide accurate scientific contexts. Therefore, this study proposed a photosynthesis research assistant (PRAG) based on OpenAI's GPT-4o with retrieval-augmented generation (RAG) techniques and prompt optimization. Vector databases and an automated feedback loop were used in the prompt optimization process to enhance the accuracy and relevance of the responses to photosynthesis-related queries. PRAG showed an average improvement of 8.7% across five metrics related to scientific writing, with a 25.4% increase in source transparency. Additionally, its scientific depth and domain coverage were comparable to those of photosynthesis research papers. A knowledge graph was used to structure PRAG's responses with papers within and outside the database, which allowed PRAG to match key entities with 63% and 39.5% of the database and test papers, respectively. PRAG can be applied for photosynthesis research and broader plant science domains, paving the way for more in-depth data analysis and predictive capabilities.