π€ AI Summary
This work addresses the challenge of attribution dilution in existing local model-agnostic explanation methods when applied to large language models with long contexts, where high-dimensional features hinder precise feature-level interpretability. To overcome this limitation, the authors propose Focus-LIME, a novel framework that introduces a surrogate modelβguided coarse-to-fine explanation mechanism. It first leverages a surrogate model to identify an informative perturbation neighborhood and then performs context-aware perturbations and fine-grained feature attribution within the refined context. This approach effectively mitigates attribution dilution in long-context settings, significantly improving explanation fidelity across multiple benchmarks. Notably, Focus-LIME enables surgical-level precise explanations in long-context scenarios for the first time, establishing a practical foundation for deploying interpretable AI in high-stakes applications.
π Abstract
As Large Language Models (LLMs) scale to handle massive context windows, achieving surgical feature-level interpretation is essential for high-stakes tasks like legal auditing and code debugging. However, existing local model-agnostic explanation methods face a critical dilemma in these scenarios: feature-based methods suffer from attribution dilution due to high feature dimensionality, thus failing to provide faithful explanations. In this paper, we propose Focus-LIME, a coarse-to-fine framework designed to restore the tractability of surgical interpretation. Focus-LIME utilizes a proxy model to curate the perturbation neighborhood, allowing the target model to perform fine-grained attribution exclusively within the optimized context. Empirical evaluations on long-context benchmarks demonstrate that our method makes surgical explanations practicable and provides faithful explanations to users.