AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution

📅 2024-11-22

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Standard leave-one-out (LOO) attribution for large language models (LLMs) incurs prohibitive computational overhead, hindering its practical use in pedagogy and interpretability analysis. Method: We propose Cache-LOO, an efficient LOO approximation framework leveraging activation caching and reuse, hierarchical context attribution, and lightweight surrogate modeling via knowledge distillation. It introduces a novel activation cache reuse mechanism and constructs high-fidelity surrogate models by semantically partitioning context and distilling LLM knowledge. Contribution/Results: Cache-LOO accelerates LOO computation by over 300× compared to standard LOO, reducing per-attribution latency to just 1/30 of typical text generation time. It achieves significantly higher attribution fidelity than existing baselines while maintaining scalability and usability. An open-source implementation enables large-scale, low-barrier deployment of LLM interpretability methods.

Technology Category

Application Category

📝 Abstract

The influence of contextual input on the behavior of large language models (LLMs) has prompted the development of context attribution methods that aim to quantify each context span's effect on an LLM's generations. The leave-one-out (LOO) error, which measures the change in the likelihood of the LLM's response when a given span of the context is removed, provides a principled way to perform context attribution, but can be prohibitively expensive to compute for large models. In this work, we introduce AttriBoT, a series of novel techniques for efficiently computing an approximation of the LOO error for context attribution. Specifically, AttriBoT uses cached activations to avoid redundant operations, performs hierarchical attribution to reduce computation, and emulates the behavior of large target models with smaller proxy models. Taken together, AttriBoT can provide a>300x speedup while remaining more faithful to a target model's LOO error than prior context attribution methods. This stark increase in performance makes computing context attributions for a given response 30x faster than generating the response itself, empowering real-world applications that require computing attributions at scale. We release a user-friendly and efficient implementation of AttriBoT to enable efficient LLM interpretability as well as encourage future development of efficient context attribution methods.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Contextual Influence Estimation

Educational Simplification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient Context Attribution

Hierarchical Attribution Method

Approximation with Small Models

🔎 Similar Papers

Fast Training Dataset Attribution via In-Context Learning