🤖 AI Summary
This study addresses the unresolved trade-off between cost and accuracy in selecting domain adaptation methods—retrieval-augmented generation (RAG) versus fine-tuning—for industrial question answering. Applying an extended Cost-of-Pass framework to two proprietary automotive datasets, the authors conduct the first systematic evaluation of RAG and fine-tuning across both closed-source and open-source large language models in an industrial QA setting. Results show that high-end closed-source models achieve the best out-of-the-box performance, yet open-source models augmented with RAG attain comparable output quality. Among all approaches, RAG demonstrates superior cost-effectiveness by balancing output quality, generation cost, and user interaction cost, offering practical guidance for industrial deployment.
📝 Abstract
Large Language Models (LLMs) are increasingly employed in enterprise question-answering (QA) systems, requiring adaptation to domain-specific knowledge. Among the most prevalent methods for incorporating such knowledge are Retrieval-Augmented Generation (RAG) and fine-tuning (FT). Yet, from a cost-accuracy trade-off perspective, it remains unclear which approach best suits industry scenarios. This study examines the impact of RAG and FT on two closed datasets specific to the automotive industry, assessing answer quality and operational costs. We extend the Cost-of-Pass framework proposed by Erol et al. (arXiv:2504.13359) to jointly assess output quality, generation cost, and user interaction cost. Our findings reveal that while premium models perform best out of the box, open-source models can achieve comparable quality when enhanced with RAG. Overall, RAG emerges as the most effective and cost-efficient adaptation method for both closed- and open-source models.