Multi-Head RAG: Solving Multi-Aspect Problems with LLMs

📅 2024-06-07

🏛️ arXiv.org

📈 Citations: 17

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Existing RAG methods struggle to retrieve semantically diverse and multifaceted relevant documents, as a single embedding space cannot simultaneously encode multiple semantic dimensions of a query. To address this, we propose MA-RAG—the first method to construct multi-granularity retrieval keys from the activation values of Transformer multi-head attention layers, enabling each attention head to naturally capture distinct semantic dimensions and thereby achieving precise, multidimensional recall of heterogeneous documents. MA-RAG requires no modification to the language model’s output head and is fully compatible with existing RAG architectures and diverse data backends. We introduce a dedicated benchmark dataset of multifaceted queries and integrate RAGAS to establish a comprehensive multidimensional evaluation framework. Experiments demonstrate that MA-RAG improves relevance metrics by up to 20% over standard RAG, significantly mitigating missed detections and bias in multifaceted query scenarios.

Technology Category

Application Category

📝 Abstract

Retrieval Augmented Generation (RAG) enhances the abilities of Large Language Models (LLMs) by enabling the retrieval of documents into the LLM context to provide more accurate and relevant responses. Existing RAG solutions do not focus on queries that may require fetching multiple documents with substantially different contents. Such queries occur frequently, but are challenging because the embeddings of these documents may be distant in the embedding space, making it hard to retrieve them all. This paper introduces Multi-Head RAG (MRAG), a novel scheme designed to address this gap with a simple yet powerful idea: leveraging activations of Transformer's multi-head attention layer, instead of the decoder layer, as keys for fetching multi-aspect documents. The driving motivation is that different attention heads can learn to capture different data aspects. Harnessing the corresponding activations results in embeddings that represent various facets of data items and queries, improving the retrieval accuracy for complex queries. We provide an evaluation methodology and metrics, multi-aspect datasets that we release online, and real-world use cases to demonstrate MRAG's effectiveness, showing improvements of up to 20% in relevance over standard RAG baselines. MRAG can be seamlessly integrated with existing RAG frameworks and benchmarking tools like RAGAS as well as different classes of data stores.

Problem

Research questions and friction points this paper is trying to address.

Addresses retrieval of diverse documents for multi-aspect queries

Improves embedding accuracy using multi-head attention activations

Enhances LLM response relevance for complex information needs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multi-head attention activations for retrieval

Improves multi-aspect document retrieval accuracy

Seamlessly integrates with existing RAG frameworks

🔎 Similar Papers

Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems