🤖 AI Summary
Efficient and reproducible hyperparameter optimization (HPO) for Retrieval-Augmented Generation (RAG) configuration remains underexplored. Method: We conduct the first large-scale empirical benchmark, systematically evaluating five HPO algorithms—Bayesian optimization, TPE, random search, greedy search, and Hyperband—across five diverse, cross-domain datasets, including a newly constructed real-world product documentation scenario. All methods jointly optimize retrieval and generation components within the largest RAG HPO search space to date. Contribution/Results: Greedy search and iterative random search consistently outperform conventional sequential tuning, delivering stable improvements in both response quality and retrieval relevance. Notably, they converge rapidly with minimal evaluations, demonstrating low-cost, high-efficiency RAG tuning. We propose a “model-first” optimization strategy and publicly release all experimental configurations and data, establishing a new benchmark for automated RAG configuration optimization.
📝 Abstract
Finding the optimal Retrieval-Augmented Generation (RAG) configuration for a given use case can be complex and expensive. Motivated by this challenge, frameworks for RAG hyper-parameter optimization (HPO) have recently emerged, yet their effectiveness has not been rigorously benchmarked. To address this gap, we present a comprehensive study involving 5 HPO algorithms over 5 datasets from diverse domains, including a new one collected for this work on real-world product documentation. Our study explores the largest HPO search space considered to date, with two optimized evaluation metrics. Analysis of the results shows that RAG HPO can be done efficiently, either greedily or with iterative random search, and that it significantly boosts RAG performance for all datasets. For greedy HPO approaches, we show that optimizing models first is preferable to the prevalent practice of optimizing sequentially according to the RAG pipeline order.