SeedBench: A Multi-task Benchmark for Evaluating Large Language Models in Seed Science

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Seed science faces interdisciplinary complexity, data scarcity, and a lack of standardized benchmarks—key barriers hindering the application of large language models (LLMs) in plant breeding. To address this, we propose SeedBench, the first multi-task evaluation benchmark specifically designed for seed science, covering core tasks including germplasm identification, hybrid prediction, genotype–phenotype inference, and breeding decision support. SeedBench integrates domain expertise to construct realistic task chains and a curated multimodal seed dataset, enabling zero-shot and few-shot evaluation. Comprehensive assessment across 26 state-of-the-art LLMs—including proprietary, open-source, and domain-adapted variants—reveals a substantial capability gap in agricultural reasoning for general-purpose models, while domain-tuned models demonstrate marked performance gains. SeedBench establishes a reproducible, extensible evaluation framework for agri-AI, providing both rigorous benchmarking standards and actionable pathways for advancing LLM-driven precision breeding.

Technology Category

Application Category

📝 Abstract

Seed science is essential for modern agriculture, directly influencing crop yields and global food security. However, challenges such as interdisciplinary complexity and high costs with limited returns hinder progress, leading to a shortage of experts and insufficient technological support. While large language models (LLMs) have shown promise across various fields, their application in seed science remains limited due to the scarcity of digital resources, complex gene-trait relationships, and the lack of standardized benchmarks. To address this gap, we introduce SeedBench -- the first multi-task benchmark specifically designed for seed science. Developed in collaboration with domain experts, SeedBench focuses on seed breeding and simulates key aspects of modern breeding processes. We conduct a comprehensive evaluation of 26 leading LLMs, encompassing proprietary, open-source, and domain-specific fine-tuned models. Our findings not only highlight the substantial gaps between the power of LLMs and the real-world seed science problems, but also make a foundational step for research on LLMs for seed design.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs in seed science lacks standardized benchmarks

Seed science faces interdisciplinary complexity and resource scarcity

Assessing LLM performance in seed breeding and gene-trait relationships

Innovation

Methods, ideas, or system contributions that make the work stand out.

First multi-task benchmark for seed science

Evaluates 26 leading LLMs comprehensively

Simulates modern seed breeding processes

🔎 Similar Papers

No similar papers found.