ALoFTRAG: Automatic Local Fine Tuning for Retrieval Augmented Generation

📅 2025-01-21

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address the significant degradation in citation and answer accuracy of Retrieval-Augmented Generation (RAG) systems when deployed on emerging domain data—such as in healthcare and finance—this paper proposes an automated, local fine-tuning framework that requires neither human annotation nor large teacher models. Our method introduces an unsupervised LoRA fine-tuning paradigm grounded in synthetic data generation and multi-stage quality filtering, jointly optimizing retrieval and generation across languages. Key components include controllable synthetic data generation, multi-dimensional quality assessment, efficient LoRA parameter adaptation, and cross-lingual retrieval-generation co-optimization. Evaluated on 20 datasets spanning 26 languages, our approach achieves average improvements of +8.3% in citation accuracy and +3.0% in answer accuracy. The framework ensures domain adaptability, data privacy preservation, and low deployment overhead, substantially enhancing the reliability and generalizability of RAG systems in sensitive domains.

Technology Category

Application Category

📝 Abstract

Retrieval Augmented Generation (RAG) systems have been shown to improve the accuracy of Large Language Model (LLM) outputs. However, these models can often achieve low accuracy when applied to new data domains. We introduce the Automatic Local Fine Tuning of Retrieval Augmented Generation models (ALoFTRAG) framework, designed to improve the accuracy of RAG systems on a given domain by training LLMs without manually labeled data or using larger teacher models. By generating and filtering synthetic training data and performing LoRA fine-tuning, ALoFTRAG improves citation and answer accuracy across 20 datasets in 26 languages by, on average, 8.3% and 3.0% respectively. Our results demonstrate that ALoFTRAG offers a practical, cost-effective, and data-secure solution for improving RAG accuracy, making it particularly applicable to sensitive domains such as healthcare and finance.

Problem

Research questions and friction points this paper is trying to address.

RAG systems

accuracy degradation

domain-specific data

Innovation

Methods, ideas, or system contributions that make the work stand out.

ALoFTRAG

LoRA Fine-tuning

Domain-specific Enhancement

🔎 Similar Papers

UniRAG: Universal Retrieval Augmentation for Large Vision Language Models