🤖 AI Summary
Existing RAG methods struggle to retrieve semantically diverse and multifaceted relevant documents, as a single embedding space cannot simultaneously encode multiple semantic dimensions of a query. To address this, we propose MA-RAG—the first method to construct multi-granularity retrieval keys from the activation values of Transformer multi-head attention layers, enabling each attention head to naturally capture distinct semantic dimensions and thereby achieving precise, multidimensional recall of heterogeneous documents. MA-RAG requires no modification to the language model’s output head and is fully compatible with existing RAG architectures and diverse data backends. We introduce a dedicated benchmark dataset of multifaceted queries and integrate RAGAS to establish a comprehensive multidimensional evaluation framework. Experiments demonstrate that MA-RAG improves relevance metrics by up to 20% over standard RAG, significantly mitigating missed detections and bias in multifaceted query scenarios.
📝 Abstract
Retrieval Augmented Generation (RAG) enhances the abilities of Large Language Models (LLMs) by enabling the retrieval of documents into the LLM context to provide more accurate and relevant responses. Existing RAG solutions do not focus on queries that may require fetching multiple documents with substantially different contents. Such queries occur frequently, but are challenging because the embeddings of these documents may be distant in the embedding space, making it hard to retrieve them all. This paper introduces Multi-Head RAG (MRAG), a novel scheme designed to address this gap with a simple yet powerful idea: leveraging activations of Transformer's multi-head attention layer, instead of the decoder layer, as keys for fetching multi-aspect documents. The driving motivation is that different attention heads can learn to capture different data aspects. Harnessing the corresponding activations results in embeddings that represent various facets of data items and queries, improving the retrieval accuracy for complex queries. We provide an evaluation methodology and metrics, multi-aspect datasets that we release online, and real-world use cases to demonstrate MRAG's effectiveness, showing improvements of up to 20% in relevance over standard RAG baselines. MRAG can be seamlessly integrated with existing RAG frameworks and benchmarking tools like RAGAS as well as different classes of data stores.