CafGa: Customizing Feature Attributions to Explain Language Models

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Traditional feature attribution methods operate at a fixed word-level granularity, limiting their ability to efficiently capture cross-word semantic units and lacking user controllability. To address this, we propose an interactive, customizable-granularity attribution tool that enables users to define arbitrary text segments and visualize deletion/insertion attribution curves, facilitating dynamic and interpretable long-text analysis. Our approach is the first to integrate tunable granularity with established model-agnostic methods (e.g., SHAP, LIME), synergistically combining human-in-the-loop intervention with quantitative evaluation—thereby overcoming inherent limitations of token-level attribution. A user study demonstrates that our tool significantly improves explanation efficiency and readability, receiving broad endorsement across diverse user backgrounds and outperforming automated alternatives such as PartitionSHAP and MExGen.

Technology Category

Application Category

📝 Abstract

Feature attribution methods, such as SHAP and LIME, explain machine learning model predictions by quantifying the influence of each input component. When applying feature attributions to explain language models, a basic question is defining the interpretable components. Traditional feature attribution methods, commonly treat individual words as atomic units. This is highly computationally inefficient for long-form text and fails to capture semantic information that spans multiple words. To address this, we present CafGa, an interactive tool for generating and evaluating feature attribution explanations at customizable granularities. CafGa supports customized segmentation with user interaction and visualizes the deletion and insertion curves for explanation assessments. Through a user study involving participants of various expertise, we confirm CafGa's usefulness, particularly among LLM practitioners. Explanations created using CafGa were also perceived as more useful compared to those generated by two fully automatic baseline methods: PartitionSHAP and MExGen, suggesting the effectiveness of the system.

Problem

Research questions and friction points this paper is trying to address.

Traditional feature attribution methods treat words as atomic units

This approach is inefficient for long text and misses semantic spans

CafGa enables customizable granularity explanations for language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Customizable granularity for feature attribution explanations

Interactive segmentation with user-defined text components

Visual assessment via deletion and insertion curves

🔎 Similar Papers

Retrieve to Explain: Evidence-driven Predictions for Explainable Drug Target Identification