ConSim: Measuring Concept-Based Explanations' Effectiveness with Automated Simulatability

📅 2025-01-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current conceptual explanation evaluation faces two key challenges: the absence of a unified metric for assessing both explanation clarity and concept library suitability, and heavy reliance on labor-intensive human evaluation, hindering scalability. To address these, we propose ConSim—a novel framework that introduces *simulatability* as the core evaluation paradigm. ConSim employs large language models (LLMs) as automated agents to predict black-box model outputs solely from conceptual explanations, thereby enabling end-to-end, integrated assessment of both conceptual quality and human interpretability. It formalizes a concept-to-prediction mapping mechanism and establishes a standardized, cross-model and cross-dataset evaluation protocol. Extensive validation across multiple models and datasets demonstrates that ConSim’s rankings strongly correlate with human judgments (Spearman’s ρ > 0.85), while significantly improving evaluation efficiency, reproducibility, and scalability.

Technology Category

Application Category

📝 Abstract
Concept-based explanations work by mapping complex model computations to human-understandable concepts. Evaluating such explanations is very difficult, as it includes not only the quality of the induced space of possible concepts but also how effectively the chosen concepts are communicated to users. Existing evaluation metrics often focus solely on the former, neglecting the latter. We introduce an evaluation framework for measuring concept explanations via automated simulatability: a simulator's ability to predict the explained model's outputs based on the provided explanations. This approach accounts for both the concept space and its interpretation in an end-to-end evaluation. Human studies for simulatability are notoriously difficult to enact, particularly at the scale of a wide, comprehensive empirical evaluation (which is the subject of this work). We propose using large language models (LLMs) as simulators to approximate the evaluation and report various analyses to make such approximations reliable. Our method allows for scalable and consistent evaluation across various models and datasets. We report a comprehensive empirical evaluation using this framework and show that LLMs provide consistent rankings of explanation methods. Code available at https://github.com/AnonymousConSim/ConSim
Problem

Research questions and friction points this paper is trying to address.

Conceptual Explanations Evaluation
Clarity Measurement
Assessment Methodologies
Innovation

Methods, ideas, or system contributions that make the work stand out.

ConSim
Automated Mimicry Technique
Large Language Model Evaluation
🔎 Similar Papers
No similar papers found.