π€ AI Summary
Generic large language models (LLMs) exhibit weak regional adaptation, insufficient multilingual support, and poor domain knowledge generalization in agricultural question answering. To address these limitations, this work introduces a multilingual synthetic data generation method grounded in domain-specific agricultural documentation, producing a high-quality agricultural QA dataset covering English, Hindi, and Punjabi. Building upon this resource, we propose a language-specific fine-grained fine-tuning strategy to enhance the modelβs capacity to capture localized farming practices, terminological consistency, and agricultural factual accuracy. Experimental results demonstrate that our approach significantly outperforms baseline models on a multilingual agricultural benchmark: factual accuracy improves by 18.7%, content relevance by 22.3%, and agricultural consensus by 15.9%. To our knowledge, this is the first method enabling native-level, high-precision, and strongly localized agricultural technical Q&A support in low-resource language settings.
π Abstract
Enabling farmers to access accurate agriculture-related information in their native languages in a timely manner is crucial for the success of the agriculture field. Although large language models (LLMs) can be used to implement Question Answering (QA) systems, simply using publicly available general-purpose LLMs in agriculture typically offer generic advisories, lacking precision in local and multilingual contexts due to insufficient domain-specific training and scarcity of high-quality, region-specific datasets. Our study addresses these limitations by generating multilingual synthetic agricultural datasets (English, Hindi, Punjabi) from agriculture-specific documents and fine-tuning language-specific LLMs. Our evaluation on curated multilingual datasets demonstrates significant improvements in factual accuracy, relevance, and agricultural consensus for the fine-tuned models compared to their baseline counterparts. These results highlight the efficacy of synthetic data-driven, language-specific fine-tuning as an effective strategy to improve the performance of LLMs in agriculture, especially in multilingual and low-resource settings. By enabling more accurate and localized agricultural advisory services, this study provides a meaningful step toward bridging the knowledge gap in AI-driven agricultural solutions for diverse linguistic communities.