🤖 AI Summary
Visual classifiers often lack interpretability when no training data or ground-truth labels are available, hindering the generation of human-understandable decision reports.
Method: This paper introduces DEXTER—a novel framework that synergistically integrates diffusion models with large language models (LLMs) to enable zero-shot, global explanation of classifier behavior. Leveraging text-guided, class-conditional image synthesis, DEXTER reconstructs decision patterns, subgroup biases, and attribution relationships without accessing original training data. It supports diverse explanation tasks—including activation maximization, bias detection, and debiasing analysis—via prompt-based optimization alone.
Results: Extensive experiments on ImageNet, Waterbirds, CelebA, and FairFaces demonstrate that DEXTER significantly outperforms existing data-free methods in explanation accuracy, readability, and user trustworthiness. By decoupling explanation from data access, DEXTER establishes a scalable, generalizable paradigm for transparent black-box model analysis.
📝 Abstract
Understanding and explaining the behavior of machine learning models is essential for building transparent and trustworthy AI systems. We introduce DEXTER, a data-free framework that employs diffusion models and large language models to generate global, textual explanations of visual classifiers. DEXTER operates by optimizing text prompts to synthesize class-conditional images that strongly activate a target classifier. These synthetic samples are then used to elicit detailed natural language reports that describe class-specific decision patterns and biases. Unlike prior work, DEXTER enables natural language explanation about a classifier's decision process without access to training data or ground-truth labels. We demonstrate DEXTER's flexibility across three tasks-activation maximization, slice discovery and debiasing, and bias explanation-each illustrating its ability to uncover the internal mechanisms of visual classifiers. Quantitative and qualitative evaluations, including a user study, show that DEXTER produces accurate, interpretable outputs. Experiments on ImageNet, Waterbirds, CelebA, and FairFaces confirm that DEXTER outperforms existing approaches in global model explanation and class-level bias reporting. Code is available at https://github.com/perceivelab/dexter.