Optimizing Retrieval-Augmented Generation: Analysis of Hyperparameter Impact on Performance and Efficiency

📅 2025-05-13

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This study systematically investigates the impact of hyperparameters on the speed–accuracy trade-off in Retrieval-Augmented Generation (RAG) systems, aiming to enhance response transparency, timeliness, and reliability. Through multidimensional evaluation—measuring faithfulness, correctness, relevancy, precision, recall, and semantic similarity—we quantitatively analyze key design choices: vector stores (Chroma vs. Faiss), chunking strategies (fixed-size vs. semantic chunking), cross-encoder re-ranking, and generation temperature. We find Chroma achieves 13% faster retrieval than Faiss but with marginally lower accuracy; fixed-size chunking with small windows significantly outperforms semantic chunking; and a corrective RAG framework attains 99% context accuracy, overcoming traditional RAG reliability bottlenecks. Notably, cross-encoder re-ranking yields only marginal gains in relevancy while increasing latency fivefold. These findings establish a verifiable, robust RAG configuration paradigm for high-stakes applications, such as clinical decision support.

Technology Category

Application Category

📝 Abstract

Large language models achieve high task performance yet often hallucinate or rely on outdated knowledge. Retrieval-augmented generation (RAG) addresses these gaps by coupling generation with external search. We analyse how hyperparameters influence speed and quality in RAG systems, covering Chroma and Faiss vector stores, chunking policies, cross-encoder re-ranking, and temperature, and we evaluate six metrics: faithfulness, answer correctness, answer relevancy, context precision, context recall, and answer similarity. Chroma processes queries 13% faster, whereas Faiss yields higher retrieval precision, revealing a clear speed-accuracy trade-off. Naive fixed-length chunking with small windows and minimal overlap outperforms semantic segmentation while remaining the quickest option. Re-ranking provides modest gains in retrieval quality yet increases runtime by roughly a factor of 5, so its usefulness depends on latency constraints. These results help practitioners balance computational cost and accuracy when tuning RAG systems for transparent, up-to-date responses. Finally, we re-evaluate the top configurations with a corrective RAG workflow and show that their advantages persist when the model can iteratively request additional evidence. We obtain a near-perfect context precision (99%), which demonstrates that RAG systems can achieve extremely high retrieval accuracy with the right combination of hyperparameters, with significant implications for applications where retrieval quality directly impacts downstream task performance, such as clinical decision support in healthcare.

Problem

Research questions and friction points this paper is trying to address.

Analyzing hyperparameter impact on RAG speed and quality

Evaluating trade-offs between retrieval precision and query speed

Optimizing chunking and re-ranking for accurate, efficient RAG systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes hyperparameter impact on RAG performance

Compares Chroma and Faiss vector stores efficiency

Evaluates chunking policies and re-ranking effects

🔎 Similar Papers

No similar papers found.