Causally Steered Diffusion for Automated Video Counterfactual Generation

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the causal integrity degradation commonly observed in video counterfactual editing, this paper proposes a black-box, fine-tuning-free causal-aware diffusion framework. Methodologically, it introduces the first causal-graph-driven text prompt optimization paradigm, synergizing vision-language model reasoning with latent-space prompt engineering to steer pre-trained latent diffusion models (LDMs) toward generating causally coherent “what-if” videos—without requiring access to the target video editing model’s architecture or parameters. The framework jointly optimizes counterfactual causal faithfulness and minimal intervention, ensuring compatibility with arbitrary off-the-shelf video editing systems. Experiments demonstrate significant improvements over baselines across both perceptual quality metrics (FVD, LPIPS) and counterfactual-specific metrics (Causal Faithfulness, Intervention Consistency). To our knowledge, this is the first work achieving high-fidelity, causally plausible counterfactual video generation solely via prompt engineering within the native LDM distribution.

Technology Category

Application Category

📝 Abstract
Adapting text-to-image (T2I) latent diffusion models for video editing has shown strong visual fidelity and controllability, but challenges remain in maintaining causal relationships in video content. Edits affecting causally dependent attributes risk generating unrealistic or misleading outcomes if these relationships are ignored. In this work, we propose a causally faithful framework for counterfactual video generation, guided by a vision-language model (VLM). Our method is agnostic to the underlying video editing system and does not require access to its internal mechanisms or finetuning. Instead, we guide the generation by optimizing text prompts based on an assumed causal graph, addressing the challenge of latent space control in LDMs. We evaluate our approach using standard video quality metrics and counterfactual-specific criteria, such as causal effectiveness and minimality. Our results demonstrate that causally faithful video counterfactuals can be effectively generated within the learned distribution of LDMs through prompt-based causal steering. With its compatibility with any black-box video editing system, our method holds significant potential for generating realistic"what-if"video scenarios in diverse areas such as healthcare and digital media.
Problem

Research questions and friction points this paper is trying to address.

Maintaining causal relationships in video counterfactual generation
Avoiding unrealistic outcomes from edits on dependent attributes
Controlling latent space in diffusion models via causal prompts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causally faithful framework for video counterfactuals
Vision-language model guides prompt optimization
Compatible with black-box video editing systems
🔎 Similar Papers
No similar papers found.
N
Nikos Spyrou
National & Kapodistrian University of Athens, Greece; Archimedes, Athena Research Center, Greece; The University of Edinburgh, UK
A
Athanasios Vlontzos
Monzo Bank, UK
Paraskevas Pegios
Paraskevas Pegios
Technical University of Denmark, Pioneer Centre for AI
Machine LearningComputer VisionExplainable AIGenerative AIMedical Image Analysis
T
Thomas Melistas
National & Kapodistrian University of Athens, Greece; Archimedes, Athena Research Center, Greece; The University of Edinburgh, UK
N
Nefeli Gkouti
National & Kapodistrian University of Athens, Greece; Archimedes, Athena Research Center, Greece; The University of Edinburgh, UK
Yannis Panagakis
Yannis Panagakis
Associate Professor, National and Kapodistrian University of Athens
Machine learningcomputer visionsignal processingoptimization
G
G. Papanastasiou
The University of Essex, UK
S
S. Tsaftaris
Archimedes, Athena Research Center, Greece; The University of Edinburgh, UK