LD-ViCE: Latent Diffusion Model for Video Counterfactual Explanations

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Video AI systems in safety-critical domains (e.g., autonomous driving, medical diagnosis) suffer from insufficient interpretability; existing counterfactual explanation methods lack temporal coherence, semantic fidelity, and causal manipulability, while failing to leverage target-model guidance. Method: We propose a target-model-guided latent diffusion framework that generates video counterfactuals in spatiotemporal latent space, employing gradient-based feedback from the target model for semantic alignment and incorporating a refinement network to enhance visual realism and inter-frame consistency. Contribution/Results: Evaluated on three benchmark datasets, our method achieves up to a 68% improvement in R² score and halves inference latency. The generated explanations exhibit strong causal operability, high semantic fidelity, and superior temporal coherence—significantly improving the trustworthiness and practical utility of video AI decision-making.

Technology Category

Application Category

📝 Abstract
Video-based AI systems are increasingly adopted in safety-critical domains such as autonomous driving and healthcare. However, interpreting their decisions remains challenging due to the inherent spatiotemporal complexity of video data and the opacity of deep learning models. Existing explanation techniques often suffer from limited temporal coherence, insufficient robustness, and a lack of actionable causal insights. Current counterfactual explanation methods typically do not incorporate guidance from the target model, reducing semantic fidelity and practical utility. We introduce Latent Diffusion for Video Counterfactual Explanations (LD-ViCE), a novel framework designed to explain the behavior of video-based AI models. Compared to previous approaches, LD-ViCE reduces the computational costs of generating explanations by operating in latent space using a state-of-the-art diffusion model, while producing realistic and interpretable counterfactuals through an additional refinement step. Our experiments demonstrate the effectiveness of LD-ViCE across three diverse video datasets, including EchoNet-Dynamic (cardiac ultrasound), FERV39k (facial expression), and Something-Something V2 (action recognition). LD-ViCE outperforms a recent state-of-the-art method, achieving an increase in R2 score of up to 68% while reducing inference time by half. Qualitative analysis confirms that LD-ViCE generates semantically meaningful and temporally coherent explanations, offering valuable insights into the target model behavior. LD-ViCE represents a valuable step toward the trustworthy deployment of AI in safety-critical domains.
Problem

Research questions and friction points this paper is trying to address.

Interpreting video-based AI decisions due to spatiotemporal complexity
Overcoming limited temporal coherence in existing explanation techniques
Generating realistic counterfactuals without target model guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent space diffusion model reduces computational costs
Additional refinement step enhances realism and interpretability
Outperforms state-of-the-art in accuracy and efficiency
🔎 Similar Papers
No similar papers found.
P
Payal Varshney
Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau, Kaiserslautern, Germany; German Research Center for Artificial Intelligence GmbH (DFKI), Kaiserslautern, Germany
Adriano Lucieri
Adriano Lucieri
PhD Researcher, DFKI GmbH
Medical Image AnalysisExplainable AI (XAI)Privacy-Preserving Deep LearningComputer-Aided Diagnosis
C
Christoph Balada
Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau, Kaiserslautern, Germany; German Research Center for Artificial Intelligence GmbH (DFKI), Kaiserslautern, Germany
Sheraz Ahmed
Sheraz Ahmed
German Research Center for Artificial Intelligence - DFKI GmbH
Andreas Dengel
Andreas Dengel
Professor of Computer Science, University of Kaiserslautern & Executive Director, DFKI
Artificial IntelligenceMachine LearningDocument AnalysisSemantic Technologies