Optimizing Retrieval-Augmented Generation: Analysis of Hyperparameter Impact on Performance and Efficiency

📅 2025-05-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically investigates the impact of hyperparameters on the speed–accuracy trade-off in Retrieval-Augmented Generation (RAG) systems, aiming to enhance response transparency, timeliness, and reliability. Through multidimensional evaluation—measuring faithfulness, correctness, relevancy, precision, recall, and semantic similarity—we quantitatively analyze key design choices: vector stores (Chroma vs. Faiss), chunking strategies (fixed-size vs. semantic chunking), cross-encoder re-ranking, and generation temperature. We find Chroma achieves 13% faster retrieval than Faiss but with marginally lower accuracy; fixed-size chunking with small windows significantly outperforms semantic chunking; and a corrective RAG framework attains 99% context accuracy, overcoming traditional RAG reliability bottlenecks. Notably, cross-encoder re-ranking yields only marginal gains in relevancy while increasing latency fivefold. These findings establish a verifiable, robust RAG configuration paradigm for high-stakes applications, such as clinical decision support.

Technology Category

Application Category

📝 Abstract
Large language models achieve high task performance yet often hallucinate or rely on outdated knowledge. Retrieval-augmented generation (RAG) addresses these gaps by coupling generation with external search. We analyse how hyperparameters influence speed and quality in RAG systems, covering Chroma and Faiss vector stores, chunking policies, cross-encoder re-ranking, and temperature, and we evaluate six metrics: faithfulness, answer correctness, answer relevancy, context precision, context recall, and answer similarity. Chroma processes queries 13% faster, whereas Faiss yields higher retrieval precision, revealing a clear speed-accuracy trade-off. Naive fixed-length chunking with small windows and minimal overlap outperforms semantic segmentation while remaining the quickest option. Re-ranking provides modest gains in retrieval quality yet increases runtime by roughly a factor of 5, so its usefulness depends on latency constraints. These results help practitioners balance computational cost and accuracy when tuning RAG systems for transparent, up-to-date responses. Finally, we re-evaluate the top configurations with a corrective RAG workflow and show that their advantages persist when the model can iteratively request additional evidence. We obtain a near-perfect context precision (99%), which demonstrates that RAG systems can achieve extremely high retrieval accuracy with the right combination of hyperparameters, with significant implications for applications where retrieval quality directly impacts downstream task performance, such as clinical decision support in healthcare.
Problem

Research questions and friction points this paper is trying to address.

Analyzing hyperparameter impact on RAG speed and quality
Evaluating trade-offs between retrieval precision and query speed
Optimizing chunking and re-ranking for accurate, efficient RAG systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes hyperparameter impact on RAG performance
Compares Chroma and Faiss vector stores efficiency
Evaluates chunking policies and re-ranking effects
🔎 Similar Papers
No similar papers found.