Learning to Attribute with Attention

📅 2025-04-18

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Attributing generated tokens in large language models—i.e., efficiently identifying the most influential preceding tokens without costly per-token ablation—remains challenging. Method: We propose AT2, the first method to treat multi-head attention weights as learnable features and calibrate their attribution capability via supervised learning, using ablation outcomes as ground-truth labels. Contribution/Results: AT2 overcomes the reliability limitations of heuristic approaches (e.g., average attention), achieving fidelity comparable to repeated ablation baselines on token-level attribution while accelerating inference by two orders of magnitude. When applied to context pruning in question answering, AT2 significantly improves answer accuracy, demonstrating strong generalization and practical utility.

Technology Category

Application Category

📝 Abstract

Given a sequence of tokens generated by a language model, we may want to identify the preceding tokens that influence the model to generate this sequence. Performing such token attribution is expensive; a common approach is to ablate preceding tokens and directly measure their effects. To reduce the cost of token attribution, we revisit attention weights as a heuristic for how a language model uses previous tokens. Naive approaches to attribute model behavior with attention (e.g., averaging attention weights across attention heads to estimate a token's influence) have been found to be unreliable. To attain faithful attributions, we propose treating the attention weights of different attention heads as features. This way, we can learn how to effectively leverage attention weights for attribution (using signal from ablations). Our resulting method, Attribution with Attention (AT2), reliably performs on par with approaches that involve many ablations, while being significantly more efficient. To showcase the utility of AT2, we use it to prune less important parts of a provided context in a question answering setting, improving answer quality. We provide code for AT2 at https://github.com/MadryLab/AT2 .

Problem

Research questions and friction points this paper is trying to address.

Identify influential preceding tokens efficiently

Improve token attribution using attention weights

Prune less important context for better answers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses attention weights as features

Learns to leverage attention for attribution

Efficiently prunes less important context parts

🔎 Similar Papers

No similar papers found.