Interactive Analysis of LLMs using Meaningful Counterfactuals

📅 2024-04-23

🏛️ arXiv.org

📈 Citations: 11

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Existing XAI methods for explaining LLMs suffer from word-level granularity limitations, computational inefficiency, misalignment with human reasoning processes, and lack of interactive, iterative analysis. To address these issues, this paper proposes a multi-granularity text counterfactual generation and interactive visualization framework. We introduce the first syntax-constrained paragraph-level substitution/deletion algorithm, generating semantically coherent and grammatically correct (97.2% accuracy) counterfactuals with high readability. Furthermore, we develop LLM Analyzer—the first interactive analytical tool for LLMs—supporting batch counterfactual generation, aggregated statistical analysis, and real-time exploration (built with D3.js and Python). Evaluated on 1,000 samples across five domains, our approach demonstrates significant improvements in explanation fidelity and diagnostic utility. A user study confirms substantial gains in behavioral understanding efficiency and attribution credibility. This work establishes a new paradigm for LLM interpretability that bridges theoretical rigor and practical deployability.

Technology Category

Application Category

📝 Abstract

Counterfactual examples are useful for exploring the decision boundaries of machine learning models and determining feature attributions. How can we apply counterfactual-based methods to analyze and explain LLMs? We identify the following key challenges. First, the generated textual counterfactuals should be meaningful and readable to users and thus can be mentally compared to draw conclusions. Second, to make the solution scalable to long-form text, users should be equipped with tools to create batches of counterfactuals from perturbations at various granularity levels and interactively analyze the results. In this paper, we tackle the above challenges and contribute 1) a novel algorithm for generating batches of complete and meaningful textual counterfactuals by removing and replacing text segments in different granularities, and 2) LLM Analyzer, an interactive visualization tool to help users understand an LLM's behaviors by interactively inspecting and aggregating meaningful counterfactuals. We evaluate the proposed algorithm by the grammatical correctness of its generated counterfactuals using 1,000 samples from medical, legal, finance, education, and news datasets. In our experiments, 97.2% of the counterfactuals are grammatically correct. Through a use case, user studies, and feedback from experts, we demonstrate the usefulness and usability of the proposed interactive visualization tool.

Problem

Research questions and friction points this paper is trying to address.

Understanding LLM behaviors for safe and reliable use

Improving inefficient word-level XAI methods for LLMs

Enabling interactive counterfactual analysis of model behaviors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive counterfactual generation for LLM analysis

Granular targeted removal and replacement algorithm

Dynamic table-based visualization with attribution scores

🔎 Similar Papers

No similar papers found.