Towards AI Evaluation in Domain-Specific RAG Systems: The AgriHubi Case Study

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This study addresses the limited applicability of large language models (LLMs) in knowledge-intensive domains such as agriculture for low-resource languages like Finnish, where challenges include weak factual grounding, English-dominated training data, and a lack of real-world evaluation. To tackle these issues, the authors propose AgriHubi, the first systematically constructed and evaluated domain-specific retrieval-augmented generation (RAG) framework tailored for Finnish agricultural contexts. AgriHubi integrates Finnish-language agricultural documents with the open-source PORO model and incorporates explicit source attribution alongside a user feedback–driven, multi-turn iterative refinement mechanism. Two rounds of user studies demonstrate significant improvements in answer completeness, linguistic accuracy, and perceived reliability. The findings also reveal practical trade-offs between model scale and response latency, offering empirical guidance for deploying RAG systems in low-resource language domains.

Technology Category

Application Category

📝 Abstract

Large language models show promise for knowledge-intensive domains, yet their use in agriculture is constrained by weak grounding, English-centric training data, and limited real-world evaluation. These issues are amplified for low-resource languages, where high-quality domain documentation exists but remains difficult to access through general-purpose models. This paper presents AgriHubi, a domain-adapted retrieval-augmented generation (RAG) system for Finnish-language agricultural decision support. AgriHubi integrates Finnish agricultural documents with open PORO family models and combines explicit source grounding with user feedback to support iterative refinement. Developed over eight iterations and evaluated through two user studies, the system shows clear gains in answer completeness, linguistic accuracy, and perceived reliability. The results also reveal practical trade-offs between response quality and latency when deploying larger models. This study provides empirical guidance for designing and evaluating domain-specific RAG systems in low-resource language settings.

Problem

Research questions and friction points this paper is trying to address.

domain-specific RAG

low-resource languages

agricultural decision support

AI evaluation

knowledge grounding

Innovation

Methods, ideas, or system contributions that make the work stand out.

domain-specific RAG

low-resource languages

source grounding