AttnTrace: Attention-based Context Traceback for Long-Context LLMs

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

To address the high computational cost and low accuracy of context attribution in large language models (LLMs) handling long contexts, this paper proposes an efficient attention-based attribution method. The method introduces context-sensitive attention reweighting and sparsification enhancement, accompanied by theoretical convergence analysis, and supports the attribution-before-detection paradigm for proactive detection of prompt injection attacks. By directly leveraging native attention weights and incorporating lightweight optimization strategies, it significantly reduces computational complexity. Experimental results demonstrate that the method improves attribution accuracy by 12.7% on average, accelerates inference by 3.2×, and precisely localizes manipulative instructions. It effectively enhances model interpretability and security in realistic long-document scenarios.

Technology Category

Application Category

📝 Abstract

Long-context large language models (LLMs), such as Gemini-2.5-Pro and Claude-Sonnet-4, are increasingly used to empower advanced AI systems, including retrieval-augmented generation (RAG) pipelines and autonomous agents. In these systems, an LLM receives an instruction along with a context--often consisting of texts retrieved from a knowledge database or memory--and generates a response that is contextually grounded by following the instruction. Recent studies have designed solutions to trace back to a subset of texts in the context that contributes most to the response generated by the LLM. These solutions have numerous real-world applications, including performing post-attack forensic analysis and improving the interpretability and trustworthiness of LLM outputs. While significant efforts have been made, state-of-the-art solutions such as TracLLM often lead to a high computation cost, e.g., it takes TracLLM hundreds of seconds to perform traceback for a single response-context pair. In this work, we propose AttnTrace, a new context traceback method based on the attention weights produced by an LLM for a prompt. To effectively utilize attention weights, we introduce two techniques designed to enhance the effectiveness of AttnTrace, and we provide theoretical insights for our design choice. We also perform a systematic evaluation for AttnTrace. The results demonstrate that AttnTrace is more accurate and efficient than existing state-of-the-art context traceback methods. We also show that AttnTrace can improve state-of-the-art methods in detecting prompt injection under long contexts through the attribution-before-detection paradigm. As a real-world application, we demonstrate that AttnTrace can effectively pinpoint injected instructions in a paper designed to manipulate LLM-generated reviews. The code is at https://github.com/Wang-Yanting/AttnTrace.

Problem

Research questions and friction points this paper is trying to address.

Efficiently trace context contributions in long-context LLMs

Reduce high computation costs in existing traceback methods

Improve interpretability and trustworthiness of LLM outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-based context traceback method

Enhances traceback accuracy and efficiency

Improves prompt injection detection

🔎 Similar Papers

Racing Thoughts: Explaining Large Language Model Contextualization Errors