The FIX Benchmark: Extracting Features Interpretable to eXperts

📅 2024-09-20
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing feature interpretability methods suffer from poor alignment with domain-specific expert knowledge, particularly in high-dimensional data where such knowledge is difficult to formalize mathematically. Method: This paper introduces FIX—the first Feature Interpretability benchmark for eXpert-knowledge alignment—built through interdisciplinary collaboration across cosmology, psychology, and medicine, and across vision, language, and time-series modalities. FIX establishes a structured knowledge encoding framework and a human-in-the-loop evaluation protocol, culminating in a unified quantitative metric: FIXScore. Contribution/Results: FIX enables the first expert-driven, cross-domain, cross-modal assessment of feature group consistency. Evaluated on six real-world tasks, mainstream methods (e.g., Grad-CAM, SHAP) achieve FIXScores consistently below 0.3, revealing severe misalignment with expert judgment. FIX provides a reproducible, comparable, and domain-grounded evaluation paradigm for explainable AI.

Technology Category

Application Category

📝 Abstract
Feature-based methods are commonly used to explain model predictions, but these methods often implicitly assume that interpretable features are readily available. However, this is often not the case for high-dimensional data, and it can be hard even for domain experts to mathematically specify which features are important. Can we instead automatically extract collections or groups of features that are aligned with expert knowledge? To address this gap, we present FIX (Features Interpretable to eXperts), a benchmark for measuring how well a collection of features aligns with expert knowledge. In collaboration with domain experts, we propose FIXScore, a unified expert alignment measure applicable to diverse real-world settings across cosmology, psychology, and medicine domains in vision, language, and time series data modalities. With FIXScore, we find that popular feature-based explanation methods have poor alignment with expert-specified knowledge, highlighting the need for new methods that can better identify features interpretable to experts.
Problem

Research questions and friction points this paper is trying to address.

Extracting expert-aligned features from high-dimensional data
Measuring feature alignment with expert knowledge across domains
Evaluating current explanation methods' poor expert alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically extracts expert-aligned feature groups
Introduces FIXScore for expert alignment measurement
Evaluates feature explanation methods across domains
🔎 Similar Papers
No similar papers found.
Helen Jin
Helen Jin
University of Pennsylvania, previously Columbia University
ExplainabilityMachine LearningNatural Language ProcessingArtificial Intelligence
Shreya Havaldar
Shreya Havaldar
Student, University of Pennsylvania
Natural Language ProcessingComputational Social Science
Chaehyeon Kim
Chaehyeon Kim
University of Pennsylvania
Anton Xue
Anton Xue
University of Texas at Austin
Machine LearningExplainabilityOptimizationFormal Methods
Weiqiu You
Weiqiu You
PhD student, University of Pennsylvania
natural language processing
H
Helen Qu
Department of Physics and Astronomy, University of Pennsylvania
M
Marco Gatti
Department of Physics and Astronomy, University of Pennsylvania
D
Daniel A. Hashimoto
Department of Surgery, Perelman School of Medicine, University of Pennsylvania
B
Bhuvnesh Jain
Department of Physics and Astronomy, University of Pennsylvania
A
Amin Madani
Department of Surgery, University of Toronto
M
Masao Sako
Department of Physics and Astronomy, University of Pennsylvania
Lyle Ungar
Lyle Ungar
University of Pennsylvania
machine learningcomputational linguisticscomputational social science
Eric Wong
Eric Wong
University of Pennsylvania
Reliable Machine LearningOptimizationExplainabilityRobustnessDebugging