Multi-Head RAG: Solving Multi-Aspect Problems with LLMs

📅 2024-06-07
🏛️ arXiv.org
📈 Citations: 17
Influential: 0
📄 PDF
🤖 AI Summary
Existing RAG methods struggle to retrieve semantically diverse and multifaceted relevant documents, as a single embedding space cannot simultaneously encode multiple semantic dimensions of a query. To address this, we propose MA-RAG—the first method to construct multi-granularity retrieval keys from the activation values of Transformer multi-head attention layers, enabling each attention head to naturally capture distinct semantic dimensions and thereby achieving precise, multidimensional recall of heterogeneous documents. MA-RAG requires no modification to the language model’s output head and is fully compatible with existing RAG architectures and diverse data backends. We introduce a dedicated benchmark dataset of multifaceted queries and integrate RAGAS to establish a comprehensive multidimensional evaluation framework. Experiments demonstrate that MA-RAG improves relevance metrics by up to 20% over standard RAG, significantly mitigating missed detections and bias in multifaceted query scenarios.

Technology Category

Application Category

📝 Abstract
Retrieval Augmented Generation (RAG) enhances the abilities of Large Language Models (LLMs) by enabling the retrieval of documents into the LLM context to provide more accurate and relevant responses. Existing RAG solutions do not focus on queries that may require fetching multiple documents with substantially different contents. Such queries occur frequently, but are challenging because the embeddings of these documents may be distant in the embedding space, making it hard to retrieve them all. This paper introduces Multi-Head RAG (MRAG), a novel scheme designed to address this gap with a simple yet powerful idea: leveraging activations of Transformer's multi-head attention layer, instead of the decoder layer, as keys for fetching multi-aspect documents. The driving motivation is that different attention heads can learn to capture different data aspects. Harnessing the corresponding activations results in embeddings that represent various facets of data items and queries, improving the retrieval accuracy for complex queries. We provide an evaluation methodology and metrics, multi-aspect datasets that we release online, and real-world use cases to demonstrate MRAG's effectiveness, showing improvements of up to 20% in relevance over standard RAG baselines. MRAG can be seamlessly integrated with existing RAG frameworks and benchmarking tools like RAGAS as well as different classes of data stores.
Problem

Research questions and friction points this paper is trying to address.

Addresses retrieval of diverse documents for multi-aspect queries
Improves embedding accuracy using multi-head attention activations
Enhances LLM response relevance for complex information needs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multi-head attention activations for retrieval
Improves multi-aspect document retrieval accuracy
Seamlessly integrates with existing RAG frameworks
🔎 Similar Papers
2024-09-29International Conference on Computational LinguisticsCitations: 4
Maciej Besta
Maciej Besta
ETH Zurich
Graph ComputationsEffective & Efficient AISparse ComputationsHigh-Performance Computing
A
Aleš Kubíček
ETH Zurich
R
Roman Niggli
ETH Zurich
R
Robert Gerstenberger
ETH Zurich
L
Lucas Weitzendorf
ETH Zurich
M
Mingyuan Chi
ETH Zurich
P
Patrick Iff
ETH Zurich
J
Joanna Gajda
Cledar
P
Piotr Nyczyk
Cledar
Jürgen Müller
Jürgen Müller
Professor für Geodäsie, Leibniz Universität Hannover
Lunar Laser Rangingrelativityspace and terrestrial gravimetry
H
H. Niewiadomski
Cledar
M
Marcin Chrapek
ETH Zurich
M
Michal Podstawski
Warsaw University of Technology
Torsten Hoefler
Torsten Hoefler
Professor of Computer Science at ETH Zurich
High Performance ComputingDeep LearningNetworkingMessage Passing InterfaceParallel and Distributed Computing