Benchmarking Counterfactual Image Generation

๐Ÿ“… 2024-03-29
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 4
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the problem of counterfactual image generation methods producing edits that violate intrinsic causal logic within images. To this end, we propose the first unified benchmark framework for systematically evaluating causal consistency and visual fidelityโ€”without requiring ground-truth labels. Methodologically, we integrate structural causal models (SCMs) with hierarchical variational autoencoders (Hierarchical VAEs), establishing a multi-model, multi-dataset, and multi-causal-graph evaluation paradigm. We further introduce customized metrics, including causal consistency, to quantify alignment with underlying causal mechanisms. Experimental results demonstrate that Hierarchical VAEs significantly outperform GAN- and flow-based baselines on both natural and medical imaging domains, highlighting their generalizability across modalities. The framework is released as an open-source, extensible Python benchmark package, enabling community-wide reproducibility, validation, and extension.

Technology Category

Application Category

๐Ÿ“ Abstract
Generative AI has revolutionised visual content editing, empowering users to effortlessly modify images and videos. However, not all edits are equal. To perform realistic edits in domains such as natural image or medical imaging, modifications must respect causal relationships inherent to the data generation process. Such image editing falls into the counterfactual image generation regime. Evaluating counterfactual image generation is substantially complex: not only it lacks observable ground truths, but also requires adherence to causal constraints. Although several counterfactual image generation methods and evaluation metrics exist, a comprehensive comparison within a unified setting is lacking. We present a comparison framework to thoroughly benchmark counterfactual image generation methods. We integrate all models that have been used for the task at hand and expand them to novel datasets and causal graphs, demonstrating the superiority of Hierarchical VAEs across most datasets and metrics. Our framework is implemented in a user-friendly Python package that can be extended to incorporate additional SCMs, causal methods, generative models, and datasets for the community to build on. Code: https://github.com/gulnazaki/counterfactual-benchmark.
Problem

Research questions and friction points this paper is trying to address.

Counterfactual Image Generation
Performance Evaluation
Causal Consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Counterfactual Image Generation
Hierarchical Variational Autoencoder
Comprehensive Evaluation Framework
T
Thomas Melistas
National & Kapodistrian University of Athens, Greece; Archimedes/Athena RC, Greece; The University of Edinburgh, UK
N
Nikos Spyrou
National & Kapodistrian University of Athens, Greece; Archimedes/Athena RC, Greece; The University of Edinburgh, UK
N
Nefeli Gkouti
National & Kapodistrian University of Athens, Greece; Archimedes/Athena RC, Greece; The University of Edinburgh, UK
P
Pedro Sanchez
The University of Edinburgh, UK
A
Athanasios Vlontzos
Imperial College London, UK; Spotify
G
G. Papanastasiou
Archimedes/Athena RC, Greece; The University of Essex, UK
S
S. Tsaftaris
Archimedes/Athena RC, Greece; The University of Edinburgh, UK