Does Using Counterfactual Help LLMs Explain Textual Importance in Classification?

📅 2025-10-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing interpretability methods for large language models (LLMs) in text classification struggle with black-box, high-query-cost LLM APIs, where internal gradients or attention mechanisms are inaccessible. Method: This paper proposes a counterfactual reasoning–based keyword identification method centered on the Decision Change Rate (DCR) metric—a quantitative framework that systematically substitutes input tokens and measures resultant label shifts to estimate each token’s causal influence on model predictions. Unlike gradient- or attention-based white-box approaches, DCR requires only API-level access and imposes no assumptions about model architecture or parameter availability. Contribution/Results: Evaluated across multiple text classification benchmarks, DCR achieves an average 12.7% improvement in keyword identification accuracy over baselines and demonstrates strong cross-model generalization. It establishes a novel, efficient, and non-intrusive paradigm for post-hoc interpretability analysis in resource-constrained, black-box LLM settings.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are becoming useful in many domains due to their impressive abilities that arise from large training datasets and large model sizes. More recently, they have been shown to be very effective in textual classification tasks, motivating the need to explain the LLMs' decisions. Motivated by practical constrains where LLMs are black-boxed and LLM calls are expensive, we study how incorporating counterfactuals into LLM reasoning can affect the LLM's ability to identify the top words that have contributed to its classification decision. To this end, we introduce a framework called the decision changing rate that helps us quantify the importance of the top words in classification. Our experimental results show that using counterfactuals can be helpful.
Problem

Research questions and friction points this paper is trying to address.

Evaluating counterfactuals' effect on LLMs' word importance identification
Developing framework to quantify key words' impact on classification decisions
Addressing black-box constraints and cost limitations in LLM explanations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporating counterfactuals into LLM reasoning
Introducing decision changing rate framework
Quantifying word importance through counterfactual analysis
🔎 Similar Papers
No similar papers found.
N
Nelvin Tan
American Express
J
James Asikin Cheung
American Express
Y
Yu-Ching Shih
American Express
D
Dong Yang
American Express
Amol Salunkhe
Amol Salunkhe
Unknown affiliation
Generative AIAgentic AIMachine LearningMLOps/LLMOps/AgentOpsXAI