Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting

📅 2025-03-27

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work introduces the “Visual Jenga” task, which aims to uncover implicit inter-object dependencies in images through physical and geometric consistency constraints: objects are sequentially removed—while preserving scene coherence—until only the background remains. Methodologically, we propose the first training-free, data-driven counterfactual inpainting paradigm. Leveraging large pretrained image inpainting models, we perform masked-reconstruction comparisons on object pairs to model relational asymmetry and infer structural causal relationships. Our approach requires no fine-tuning or human annotations, yet efficiently ranks object dependencies directly on real-world images. It significantly outperforms existing baselines and automatically discovers hierarchical support structures aligned with physical intuition. By grounding causal reasoning in geometric and physical plausibility, Visual Jenga establishes a novel framework for causal understanding in visual perception and scene understanding.

Technology Category

Application Category

📝 Abstract

This paper proposes a novel scene understanding task called Visual Jenga. Drawing inspiration from the game Jenga, the proposed task involves progressively removing objects from a single image until only the background remains. Just as Jenga players must understand structural dependencies to maintain tower stability, our task reveals the intrinsic relationships between scene elements by systematically exploring which objects can be removed while preserving scene coherence in both physical and geometric sense. As a starting point for tackling the Visual Jenga task, we propose a simple, data-driven, training-free approach that is surprisingly effective on a range of real-world images. The principle behind our approach is to utilize the asymmetry in the pairwise relationships between objects within a scene and employ a large inpainting model to generate a set of counterfactuals to quantify the asymmetry.

Problem

Research questions and friction points this paper is trying to address.

Identifies object dependencies in scenes via counterfactual inpainting

Explores removable objects while maintaining scene coherence

Quantifies pairwise object relationship asymmetry using inpainting models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses counterfactual inpainting for object dependencies

Training-free data-driven approach for scene coherence

Leverages asymmetry in pairwise object relationships

🔎 Similar Papers

Masked Image Modeling: A Survey