TempPerturb-Eval: On the Joint Effects of Internal Temperature and External Perturbations in RAG Robustness

📅 2025-11-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RAG evaluation methods typically assess retrieval quality or generation parameters (e.g., temperature) in isolation, neglecting their interplay. This work is the first to systematically investigate the joint effect of text perturbations—modeling noisy retrieval—and LLM generation temperature. We propose a novel joint analysis framework and introduce a diagnostic benchmark, HotpotQA, incorporating three perturbation types and multiple temperature levels across both open- and closed-source LLMs. Experiments reveal that higher temperatures markedly amplify model sensitivity to perturbations; perturbation responses are nonlinear, exhibiting critical temperature thresholds; and performance degradation stems from identifiable interaction mechanisms. Our findings enhance the interpretability and robustness of RAG systems under noisy retrieval conditions and provide empirical guidance for co-optimizing retrieval and generation parameters.

Technology Category

Application Category

📝 Abstract
The evaluation of Retrieval-Augmented Generation (RAG) systems typically examines retrieval quality and generation parameters like temperature in isolation, overlooking their interaction. This work presents a systematic investigation of how text perturbations (simulating noisy retrieval) interact with temperature settings across multiple LLM runs. We propose a comprehensive RAG Perturbation-Temperature Analysis Framework that subjects retrieved documents to three distinct perturbation types across varying temperature settings. Through extensive experiments on HotpotQA with both open-source and proprietary LLMs, we demonstrate that performance degradation follows distinct patterns: high-temperature settings consistently amplify vulnerability to perturbations, while certain perturbation types exhibit non-linear sensitivity across the temperature range. Our work yields three key contributions: (1) a diagnostic benchmark for assessing RAG robustness, (2) an analytical framework for quantifying perturbation-temperature interactions, and (3) practical guidelines for model selection and parameter tuning under noisy retrieval conditions.
Problem

Research questions and friction points this paper is trying to address.

Investigates interaction between retrieval noise and generation temperature in RAG systems
Proposes framework to analyze how perturbations affect performance across temperature settings
Provides guidelines for robust model selection under noisy retrieval conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework analyzing perturbation-temperature interactions in RAG
Benchmark assessing RAG robustness under noisy conditions
Guidelines for model selection with noisy retrieval
🔎 Similar Papers
No similar papers found.