Towards AI Evaluation in Domain-Specific RAG Systems: The AgriHubi Case Study

๐Ÿ“… 2026-02-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the limited applicability of large language models (LLMs) in knowledge-intensive domains such as agriculture for low-resource languages like Finnish, where challenges include weak factual grounding, English-dominated training data, and a lack of real-world evaluation. To tackle these issues, the authors propose AgriHubi, the first systematically constructed and evaluated domain-specific retrieval-augmented generation (RAG) framework tailored for Finnish agricultural contexts. AgriHubi integrates Finnish-language agricultural documents with the open-source PORO model and incorporates explicit source attribution alongside a user feedbackโ€“driven, multi-turn iterative refinement mechanism. Two rounds of user studies demonstrate significant improvements in answer completeness, linguistic accuracy, and perceived reliability. The findings also reveal practical trade-offs between model scale and response latency, offering empirical guidance for deploying RAG systems in low-resource language domains.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models show promise for knowledge-intensive domains, yet their use in agriculture is constrained by weak grounding, English-centric training data, and limited real-world evaluation. These issues are amplified for low-resource languages, where high-quality domain documentation exists but remains difficult to access through general-purpose models. This paper presents AgriHubi, a domain-adapted retrieval-augmented generation (RAG) system for Finnish-language agricultural decision support. AgriHubi integrates Finnish agricultural documents with open PORO family models and combines explicit source grounding with user feedback to support iterative refinement. Developed over eight iterations and evaluated through two user studies, the system shows clear gains in answer completeness, linguistic accuracy, and perceived reliability. The results also reveal practical trade-offs between response quality and latency when deploying larger models. This study provides empirical guidance for designing and evaluating domain-specific RAG systems in low-resource language settings.
Problem

Research questions and friction points this paper is trying to address.

domain-specific RAG
low-resource languages
agricultural decision support
AI evaluation
knowledge grounding
Innovation

Methods, ideas, or system contributions that make the work stand out.

domain-specific RAG
low-resource languages
source grounding
user feedback loop
agricultural AI
๐Ÿ”Ž Similar Papers
No similar papers found.
Md Toufique Hasan
Md Toufique Hasan
Doctoral Researcher, Tampere University
LLMsGenerative AIMachine LearningSoftware Engineering
A
A. Khan
Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland
Mika Saari
Mika Saari
Tampere University
AILLMIoTTeaching Of ProgrammingSustainable Software Development
V
Vaishnavi Bankhele
Faculty of Information Technology and Communication Sciences, Tampere University, Tampere, Finland
Pekka Abrahamsson
Pekka Abrahamsson
Professor of Software Engineering at Tampere University, Finland
Software EngineeringGenerative AI