🤖 AI Summary
Current AI models struggle to infer causal relationships among visual entities from images, typically only detecting superficial co-occurrence. To address this, we propose Visual Causal Discovery (VCD) as a novel task and introduce VCG-32K—the first large-scale visual causal graph dataset, containing 32K image-level causal graph annotations. We further develop CauSight, a causal-aware vision-language model. Methodologically, CauSight integrates the Tree-of-Causal-Thought reasoning paradigm, multimodal causal graph learning, synthetic reasoning trajectory augmentation, and causal reward-driven reinforcement learning. Experiments demonstrate that CauSight significantly outperforms GPT-4.1 on VCD: it achieves an absolute accuracy gain of 21% and attains over three times the relative performance. All components—including source code, pretrained models, and the VCG-32K dataset—are publicly released.
📝 Abstract
Causal thinking enables humans to understand not just what is seen, but why it happens. To replicate this capability in modern AI systems, we introduce the task of visual causal discovery. It requires models to infer cause-and-effect relations among visual entities across diverse scenarios instead of merely perceiving their presence. To this end, we first construct the Visual Causal Graph dataset (VCG-32K), a large-scale collection of over 32,000 images annotated with entity-level causal graphs, and further develop CauSight, a novel vision-language model to perform visual causal discovery through causally aware reasoning. Our training recipe integrates three components: (1) training data curation from VCG-32K, (2) Tree-of-Causal-Thought (ToCT) for synthesizing reasoning trajectories, and (3) reinforcement learning with a designed causal reward to refine the reasoning policy. Experiments show that CauSight outperforms GPT-4.1 on visual causal discovery, achieving over a threefold performance boost (21% absolute gain). Our code, model, and dataset are fully open-sourced at project page: https://github.com/OpenCausaLab/CauSight.