Does Using Counterfactual Help LLMs Explain Textual Importance in Classification?

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Existing interpretability methods for large language models (LLMs) in text classification struggle with black-box, high-query-cost LLM APIs, where internal gradients or attention mechanisms are inaccessible. Method: This paper proposes a counterfactual reasoning–based keyword identification method centered on the Decision Change Rate (DCR) metric—a quantitative framework that systematically substitutes input tokens and measures resultant label shifts to estimate each token’s causal influence on model predictions. Unlike gradient- or attention-based white-box approaches, DCR requires only API-level access and imposes no assumptions about model architecture or parameter availability. Contribution/Results: Evaluated across multiple text classification benchmarks, DCR achieves an average 12.7% improvement in keyword identification accuracy over baselines and demonstrates strong cross-model generalization. It establishes a novel, efficient, and non-intrusive paradigm for post-hoc interpretability analysis in resource-constrained, black-box LLM settings.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are becoming useful in many domains due to their impressive abilities that arise from large training datasets and large model sizes. More recently, they have been shown to be very effective in textual classification tasks, motivating the need to explain the LLMs' decisions. Motivated by practical constrains where LLMs are black-boxed and LLM calls are expensive, we study how incorporating counterfactuals into LLM reasoning can affect the LLM's ability to identify the top words that have contributed to its classification decision. To this end, we introduce a framework called the decision changing rate that helps us quantify the importance of the top words in classification. Our experimental results show that using counterfactuals can be helpful.

Problem

Research questions and friction points this paper is trying to address.

Evaluating counterfactuals' effect on LLMs' word importance identification

Developing framework to quantify key words' impact on classification decisions

Addressing black-box constraints and cost limitations in LLM explanations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporating counterfactuals into LLM reasoning

Introducing decision changing rate framework

Quantifying word importance through counterfactual analysis

🔎 Similar Papers

No similar papers found.