A Comparison of Independent and Joint Fine-tuning Strategies for Retrieval-Augmented Generation

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Selecting an optimal fine-tuning strategy for Retrieval-Augmented Generation (RAG) systems remains challenging due to trade-offs between performance gains and computational cost, particularly under varying data conditions (e.g., presence/absence of context labels) and hyperparameter tuning requirements. Method: We systematically compare three fine-tuning paradigms—decoupled, joint, and two-stage—within a unified experimental framework across multiple QA benchmarks, evaluating their impact on the synergy between generative and embedding models while quantifying training overhead. Contribution/Results: All strategies yield comparable improvements in EM/F1 (±0.8%), yet exhibit up to 3.2× disparity in training cost. Decoupled fine-tuning is most efficient when context labels are available; in label-scarce settings, two-stage fine-tuning achieves superior performance–robustness balance while eliminating costly learning-rate grid search. This work establishes the first empirically grounded decision boundaries for RAG fine-tuning strategy selection, delivering a reproducible, deployment-oriented guideline.

Technology Category

Application Category

📝 Abstract

A Comparison of Independent and Joint Fine-tuning Strategies for Retrieval-Augmented Generation Download PDF Neal Gregory Lawton, Alfy Samuel, Anoop Kumar, Daben Liu Published: 20 Aug 2025, Last Modified: 17 Sept 2025EMNLP 2025 FindingsConference, Publication Chairs, AuthorsRevisionsBibTeXCC BY 4.0 Keywords: Retrieval-Augmented Generation (RAG), Large Language Models (LLMs), Fine-tuning, Question Answering, Joint fine-tuning TL;DR: We evaluate and compare strategies for fine-tuning Retrieval Augmented Generation (RAG) pipelines, including independent fine-tuning, joint fine-tuning, and two-phase fine-tuning. Abstract: Retrieval augmented generation (RAG) is a popular framework for question answering that is powered by two large language models (LLMs): an embedding model that retrieves context documents from a database that are relevant to a given question, and a generator model that uses the retrieved context to generate an answer to the question. Both the embedding and generator models can be fine-tuned to increase performance of a RAG pipeline on a new task, but multiple fine-tuning strategies exist with different costs and benefits. In this paper, we evaluate and compare several RAG fine-tuning strategies, including independent, joint, and two-phase fine-tuning. In our experiments, we observe that all of these strategies achieve about equal improvement in EM and F1 generation quality metrics, although they have significantly different computational costs. We conclude the optimal fine-tuning strategy to use depends on whether the training dataset includes context labels and whether a grid search over the learning rates for the embedding and generator models is required.

Problem

Research questions and friction points this paper is trying to address.

Compares fine-tuning strategies for Retrieval-Augmented Generation pipelines

Evaluates independent, joint, and two-phase fine-tuning approaches

Determines optimal strategy based on dataset labels and computational costs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compares independent and joint fine-tuning strategies

Evaluates two-phase fine-tuning for RAG pipelines

Optimizes embedding and generator model performance

🔎 Similar Papers

No similar papers found.

Authors to Follow