🤖 AI Summary
Aerial panoramic scene understanding is hindered by the lack of high-fidelity datasets supporting joint semantic-geometric reconstruction. To address this, we introduce ClaraVid—the first synthetic benchmark tailored for this task—comprising 16,917 high-resolution (4032×3024) images with dense depth maps, panoramic segmentation masks, sparse point clouds, and dynamic object annotations. We propose Delentropic Scene Profile (DSP), a novel complexity metric based on differential entropy, establishing—for the first time—a quantitative negative correlation between scene complexity and reconstruction error. Furthermore, we design a low-artifact, landscape-agnostic synthetic pipeline built on a customized Unreal Engine framework, and develop a systematic evaluation framework encompassing Neural Radiance Fields (NeRF) and multi-task supervised learning. Empirical results demonstrate that DSP effectively attributes model performance degradation, providing interpretable, theoretically grounded guidance for algorithm selection and data augmentation strategies.
📝 Abstract
The development of aerial holistic scene understanding algorithms is hindered by the scarcity of comprehensive datasets that enable both semantic and geometric reconstruction. While synthetic datasets offer an alternative, existing options exhibit task-specific limitations, unrealistic scene compositions, and rendering artifacts that compromise real-world applicability. We introduce ClaraVid, a synthetic aerial dataset specifically designed to overcome these limitations. Comprising 16,917 high-resolution images captured at 4032x3024 from multiple viewpoints across diverse landscapes, ClaraVid provides dense depth maps, panoptic segmentation, sparse point clouds, and dynamic object masks, while mitigating common rendering artifacts. To further advance neural reconstruction, we introduce the Delentropic Scene Profile (DSP), a novel complexity metric derived from differential entropy analysis, designed to quantitatively assess scene difficulty and inform reconstruction tasks. Utilizing DSP, we systematically benchmark neural reconstruction methods, uncovering a consistent, measurable correlation between scene complexity and reconstruction accuracy. Empirical results indicate that higher delentropy strongly correlates with increased reconstruction errors, validating DSP as a reliable complexity prior. Currently under review, upon acceptance the data and code will be available at $href{https://rdbch.github.io/claravid}{rdbch.github.io/ClaraVid}$.