ConSim: Measuring Concept-Based Explanations' Effectiveness with Automated Simulatability

📅 2025-01-10

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Current conceptual explanation evaluation faces two key challenges: the absence of a unified metric for assessing both explanation clarity and concept library suitability, and heavy reliance on labor-intensive human evaluation, hindering scalability. To address these, we propose ConSim—a novel framework that introduces *simulatability* as the core evaluation paradigm. ConSim employs large language models (LLMs) as automated agents to predict black-box model outputs solely from conceptual explanations, thereby enabling end-to-end, integrated assessment of both conceptual quality and human interpretability. It formalizes a concept-to-prediction mapping mechanism and establishes a standardized, cross-model and cross-dataset evaluation protocol. Extensive validation across multiple models and datasets demonstrates that ConSim’s rankings strongly correlate with human judgments (Spearman’s ρ > 0.85), while significantly improving evaluation efficiency, reproducibility, and scalability.

Technology Category

Application Category

📝 Abstract

Concept-based explanations work by mapping complex model computations to human-understandable concepts. Evaluating such explanations is very difficult, as it includes not only the quality of the induced space of possible concepts but also how effectively the chosen concepts are communicated to users. Existing evaluation metrics often focus solely on the former, neglecting the latter. We introduce an evaluation framework for measuring concept explanations via automated simulatability: a simulator's ability to predict the explained model's outputs based on the provided explanations. This approach accounts for both the concept space and its interpretation in an end-to-end evaluation. Human studies for simulatability are notoriously difficult to enact, particularly at the scale of a wide, comprehensive empirical evaluation (which is the subject of this work). We propose using large language models (LLMs) as simulators to approximate the evaluation and report various analyses to make such approximations reliable. Our method allows for scalable and consistent evaluation across various models and datasets. We report a comprehensive empirical evaluation using this framework and show that LLMs provide consistent rankings of explanation methods. Code available at https://github.com/AnonymousConSim/ConSim

Problem

Research questions and friction points this paper is trying to address.

Conceptual Explanations Evaluation

Clarity Measurement

Assessment Methodologies

Innovation

Methods, ideas, or system contributions that make the work stand out.

ConSim

Automated Mimicry Technique

Large Language Model Evaluation

🔎 Similar Papers

No similar papers found.