🤖 AI Summary
Existing novel view synthesis (NVS) benchmarks lack high-quality dynamic scenes and multimodal ground truth, hindering robust training and fair evaluation of 4D reconstruction and neural rendering models. To address this, we introduce AnimNVS—the first NVS benchmark targeting animation-film-grade quality—by incorporating high-fidelity animated films into benchmark construction for the first time. We generate dense, multimodal ground truth including RGB, depth, surface normals, optical flow, and instance-level object segmentation. Furthermore, we propose a three-tiered evaluation protocol covering dense, sparse, and monocular settings. Leveraging differentiable camera modeling and physically consistent lighting-aware rendering, we synthesize over 100,000 high-fidelity frames. AnimNVS significantly improves generalization, geometric consistency, and motion modeling capability of NVS methods on complex dynamic scenes, establishing a unified evaluation standard for 4D scene understanding.
📝 Abstract
This paper presents a new dataset for Novel View Synthesis, generated from a high-quality, animated film with stunning realism and intricate detail. Our dataset captures a variety of dynamic scenes, complete with detailed textures, lighting, and motion, making it ideal for training and evaluating cutting-edge 4D scene reconstruction and novel view generation models. In addition to high-fidelity RGB images, we provide multiple complementary modalities, including depth, surface normals, object segmentation and optical flow, enabling a deeper understanding of scene geometry and motion. The dataset is organised into three distinct benchmarking scenarios: a dense multi-view camera setup, a sparse camera arrangement, and monocular video sequences, enabling a wide range of experimentation and comparison across varying levels of data sparsity. With its combination of visual richness, high-quality annotations, and diverse experimental setups, this dataset offers a unique resource for pushing the boundaries of view synthesis and 3D vision.