ALoFTRAG: Automatic Local Fine Tuning for Retrieval Augmented Generation

📅 2025-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the significant degradation in citation and answer accuracy of Retrieval-Augmented Generation (RAG) systems when deployed on emerging domain data—such as in healthcare and finance—this paper proposes an automated, local fine-tuning framework that requires neither human annotation nor large teacher models. Our method introduces an unsupervised LoRA fine-tuning paradigm grounded in synthetic data generation and multi-stage quality filtering, jointly optimizing retrieval and generation across languages. Key components include controllable synthetic data generation, multi-dimensional quality assessment, efficient LoRA parameter adaptation, and cross-lingual retrieval-generation co-optimization. Evaluated on 20 datasets spanning 26 languages, our approach achieves average improvements of +8.3% in citation accuracy and +3.0% in answer accuracy. The framework ensures domain adaptability, data privacy preservation, and low deployment overhead, substantially enhancing the reliability and generalizability of RAG systems in sensitive domains.

Technology Category

Application Category

📝 Abstract
Retrieval Augmented Generation (RAG) systems have been shown to improve the accuracy of Large Language Model (LLM) outputs. However, these models can often achieve low accuracy when applied to new data domains. We introduce the Automatic Local Fine Tuning of Retrieval Augmented Generation models (ALoFTRAG) framework, designed to improve the accuracy of RAG systems on a given domain by training LLMs without manually labeled data or using larger teacher models. By generating and filtering synthetic training data and performing LoRA fine-tuning, ALoFTRAG improves citation and answer accuracy across 20 datasets in 26 languages by, on average, 8.3% and 3.0% respectively. Our results demonstrate that ALoFTRAG offers a practical, cost-effective, and data-secure solution for improving RAG accuracy, making it particularly applicable to sensitive domains such as healthcare and finance.
Problem

Research questions and friction points this paper is trying to address.

RAG systems
accuracy degradation
domain-specific data
Innovation

Methods, ideas, or system contributions that make the work stand out.

ALoFTRAG
LoRA Fine-tuning
Domain-specific Enhancement
🔎 Similar Papers
No similar papers found.