🤖 AI Summary
Latent variables in deep generative models (e.g., VAEs, diffusion models) suffer from poor semantic interpretability. Method: This paper proposes a novel framework that jointly leverages latent-space perturbation and multimodal large language models (MLLMs) for interpretable reasoning. By systematically perturbing latent codes and analyzing corresponding generative outputs—combined with inductive-bias-aligned prompting and uncertainty quantification—the method produces fine-grained, trustworthy semantic explanations. Contribution/Results: To our knowledge, this is the first work to integrate MLLMs into latent variable interpretation. It achieves high explanation fidelity and consistency. Extensive experiments on both real-world and synthetic benchmarks demonstrate strong performance, with human evaluation yielding an 82.3% inter-annotator agreement rate—significantly surpassing existing baselines.
📝 Abstract
Deep generative models like VAEs and diffusion models have advanced various generation tasks by leveraging latent variables to learn data distributions and generate high-quality samples. Despite the field of explainable AI making strides in interpreting machine learning models, understanding latent variables in generative models remains challenging. This paper introduces extit{LatentExplainer}, a framework for automatically generating semantically meaningful explanations of latent variables in deep generative models. extit{LatentExplainer} tackles three main challenges: inferring the meaning of latent variables, aligning explanations with inductive biases, and handling varying degrees of explainability. Our approach perturbs latent variables, interpreting changes in generated data, and uses multi-modal large language models (MLLMs) to produce human-understandable explanations. We evaluate our proposed method on several real-world and synthetic datasets, and the results demonstrate superior performance in generating high-quality explanations for latent variables. The results highlight the effectiveness of incorporating inductive biases and uncertainty quantification, significantly enhancing model interpretability.