SeedBench: A Multi-task Benchmark for Evaluating Large Language Models in Seed Science

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Seed science faces interdisciplinary complexity, data scarcity, and a lack of standardized benchmarks—key barriers hindering the application of large language models (LLMs) in plant breeding. To address this, we propose SeedBench, the first multi-task evaluation benchmark specifically designed for seed science, covering core tasks including germplasm identification, hybrid prediction, genotype–phenotype inference, and breeding decision support. SeedBench integrates domain expertise to construct realistic task chains and a curated multimodal seed dataset, enabling zero-shot and few-shot evaluation. Comprehensive assessment across 26 state-of-the-art LLMs—including proprietary, open-source, and domain-adapted variants—reveals a substantial capability gap in agricultural reasoning for general-purpose models, while domain-tuned models demonstrate marked performance gains. SeedBench establishes a reproducible, extensible evaluation framework for agri-AI, providing both rigorous benchmarking standards and actionable pathways for advancing LLM-driven precision breeding.

Technology Category

Application Category

📝 Abstract
Seed science is essential for modern agriculture, directly influencing crop yields and global food security. However, challenges such as interdisciplinary complexity and high costs with limited returns hinder progress, leading to a shortage of experts and insufficient technological support. While large language models (LLMs) have shown promise across various fields, their application in seed science remains limited due to the scarcity of digital resources, complex gene-trait relationships, and the lack of standardized benchmarks. To address this gap, we introduce SeedBench -- the first multi-task benchmark specifically designed for seed science. Developed in collaboration with domain experts, SeedBench focuses on seed breeding and simulates key aspects of modern breeding processes. We conduct a comprehensive evaluation of 26 leading LLMs, encompassing proprietary, open-source, and domain-specific fine-tuned models. Our findings not only highlight the substantial gaps between the power of LLMs and the real-world seed science problems, but also make a foundational step for research on LLMs for seed design.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs in seed science lacks standardized benchmarks
Seed science faces interdisciplinary complexity and resource scarcity
Assessing LLM performance in seed breeding and gene-trait relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

First multi-task benchmark for seed science
Evaluates 26 leading LLMs comprehensively
Simulates modern seed breeding processes
🔎 Similar Papers
No similar papers found.
J
Jie Ying
Shanghai Artificial Intelligence Laboratory
Z
Zihong Chen
Shanghai Artificial Intelligence Laboratory
Z
Zhefan Wang
Shanghai Artificial Intelligence Laboratory
W
Wanli Jiang
Shanghai Artificial Intelligence Laboratory
C
Chenyang Wang
Shanghai Artificial Intelligence Laboratory
Z
Zhonghang Yuan
Shanghai Artificial Intelligence Laboratory
H
Haoyang Su
Shanghai Artificial Intelligence Laboratory
Huanjun Kong
Huanjun Kong
Shanghai AI Laboratory
infraapplication
F
Fan Yang
Yazhouwan National Laboratory
Nanqing Dong
Nanqing Dong
Shanghai Artificial Intelligence Laboratory; University of Oxford
Machine LearningComputer VisionOptimizationAI for Science