🤖 AI Summary
This paper addresses the insufficient contextual attribution interpretability in generative question answering. We formulate context segment importance identification as a Combinatorial Multi-Armed Bandit (CMAB) problem—the first such formulation—and propose an efficient attribution framework based on combinatorial Thompson sampling. Our method employs a normalized token-likelihood reward function, enabling high-precision attribution over the exponential subset space with low query overhead—significantly fewer queries than SHAP-based baselines. Experiments across multiple standard benchmarks and mainstream large language models demonstrate that our approach matches or surpasses state-of-the-art methods in attribution quality while substantially reducing computational cost. Key contributions are: (1) the first CMAB-based modeling of contextual attribution; (2) Pareto-improved trade-offs between query efficiency and attribution accuracy; and (3) a lightweight, generalizable, and scalable solution for enhancing model interpretability.
📝 Abstract
Understanding which parts of the retrieved context contribute to a large language model's generated answer is essential for building interpretable and trustworthy generative QA systems. We propose a novel framework that formulates context attribution as a combinatorial multi-armed bandit (CMAB) problem. Each context segment is treated as a bandit arm, and we employ Combinatorial Thompson Sampling (CTS) to efficiently explore the exponentially large space of context subsets under a limited query budget. Our method defines a reward function based on normalized token likelihoods, capturing how well a subset of segments supports the original model response. Unlike traditional perturbation-based attribution methods such as SHAP, which sample subsets uniformly and incur high computational costs, our approach adaptively balances exploration and exploitation by leveraging posterior estimates of segment relevance. This leads to substantially improved query efficiency while maintaining high attribution fidelity. Extensive experiments on diverse datasets and LLMs demonstrate that our method achieves competitive attribution quality with fewer model queries.