π€ AI Summary
This work addresses the challenge that Retrieval-Augmented Generation (RAG) system performance is heavily influenced by complex interactions among multiple modules, resulting in a vast and opaque configuration space that hinders effective design optimization. To this end, we propose the first visualization-driven diagnostic framework for RAG systems that integrates macro- and micro-level analysis, enabling developers to navigate from global performance overviews down to fine-grained failure cases. The framework incorporates interactive context intervention and error attribution mechanisms, combining visual analytics, embedding model evaluation, and an interactive interface to facilitate efficient configuration comparison and hypothesis validation. Case studies and user experiments demonstrate that our approach significantly enhances developersβ understanding of RAG behavior and their ability to tune system performance. The implementation is publicly released.
π Abstract
The advent of Retrieval-Augmented Generation (RAG) has significantly enhanced the ability of Large Language Models (LLMs) to produce factually accurate and up-to-date responses. However, the performance of a RAG system is not determined by a single component but emerges from a complex interplay of modular choices, such as embedding models and retrieval algorithms. This creates a vast and often opaque configuration space, making it challenging for developers to understand performance trade-offs and identify optimal designs. To address this challenge, we present RAGExplorer, a visual analytics system for the systematic comparison and diagnosis of RAG configurations. RAGExplorer guides users through a seamless macro-to-micro analytical workflow. Initially, it empowers developers to survey the performance landscape across numerous configurations, allowing for a high-level understanding of which design choices are most effective. For a deeper analysis, the system enables users to drill down into individual failure cases, investigate how differences in retrieved information contribute to errors, and interactively test hypotheses by manipulating the provided context to observe the resulting impact on the generated answer. We demonstrate the effectiveness of RAGExplorer through detailed case studies and user studies, validating its ability to empower developers in navigating the complex RAG design space. Our code and user guide are publicly available at https://github.com/Thymezzz/RAGExplorer.