🤖 AI Summary
Existing RAG systems rely heavily on heuristic configurations, lacking systematic evaluation and reproducibility. This work formalizes RAG design as an architecture search problem and introduces RAISE, a unified benchmark that establishes a standardized framework to enable controlled and reproducible hyperparameter optimization research. Within a standardized search space and computational budget, we integrate 13 search algorithms and conduct comprehensive experiments across seven textual and multimodal datasets. Our results demonstrate that the effectiveness of optimization strategies is highly task-dependent, with no single method consistently outperforming others across all settings. These findings caution against drawing conclusions about universal superiority based on aggregated rankings, underscoring the necessity of task-specific evaluation in RAG system design.
📝 Abstract
Retrieval-augmented generation (RAG) systems expose numerous design choices spanning query rewriting, chunking, retrieval depth, reranking, and context compression. In practice, these choices are often configured through heuristics, hindering systematic evaluation and reproducibility across settings. We argue that this challenge is best formulated as RAG architecture search. To support controlled and reproducible study of this problem, we introduce the RAG Intelligence Search Engine (RAISE), a comprehensive framework and benchmark for RAG hyperparameter optimization, which evaluates optimization methods for RAG pipelines under standardized search spaces and budgets. RAISE implements 13 search algorithms and evaluates them across seven public text and multimodal datasets using three random seeds. Our experiments show that optimization performance is highly task-dependent: methods that perform strongly on one dataset may not generalize consistently across others, cautioning against interpreting aggregate rankings as evidence of universally superior strategies. RAISE provides a common experimental substrate for fair, reproducible, and systematic research on RAG hyperparameter optimization.