ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

📅 2026-03-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current visual generative models exhibit limited capabilities in physical, causal, and complex spatial reasoning, yet existing evaluation methods often rely on superficial metrics or fragmented benchmarks that fail to accurately assess true reasoning proficiency. To address this gap, this work proposes a unified evaluation framework for visual generative reasoning, featuring cross-modal (image–video) task design, dual-track assessment of both generation processes and outputs, an evidence-driven automatic scoring mechanism, and fine-grained analysis grounded in cognitive dimensions. Experiments across more than twenty state-of-the-art models reveal substantial deficiencies even in the most advanced systems, thereby demonstrating the effectiveness and necessity of the proposed framework as a “stress test” for next-generation intelligent visual models.
📝 Abstract
Beneath the stunning visual fidelity of modern AIGC models lies a "logical desert", where systems fail tasks that require physical, causal, or complex spatial reasoning. Current evaluations largely rely on superficial metrics or fragmented benchmarks, creating a ``performance mirage'' that overlooks the generative process. To address this, we introduce ViGoR Vision-G}nerative Reasoning-centric Benchmark), a unified framework designed to dismantle this mirage. ViGoR distinguishes itself through four key innovations: 1) holistic cross-modal coverage bridging Image-to-Image and Video tasks; 2) a dual-track mechanism evaluating both intermediate processes and final results; 3) an evidence-grounded automated judge ensuring high human alignment; and 4) granular diagnostic analysis that decomposes performance into fine-grained cognitive dimensions. Experiments on over 20 leading models reveal that even state-of-the-art systems harbor significant reasoning deficits, establishing ViGoR as a critical ``stress test'' for the next generation of intelligent vision models. The demo have been available at https://vincenthancoder.github.io/ViGoR-Bench/
Problem

Research questions and friction points this paper is trying to address.

visual generative models
zero-shot visual reasoning
reasoning evaluation
performance mirage
cognitive reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

visual generative reasoning
zero-shot evaluation
cross-modal benchmark
process-aware assessment
cognitive diagnostic analysis