🤖 AI Summary
Most existing 3D vision methods assume static camera intrinsics, yet real-world videos frequently exhibit dynamic intrinsic parameters—especially in uncontrolled outdoor settings—while no benchmark with frame-wise ground-truth intrinsics for such scenarios exists. To address this gap, we introduce DCI-Bench, the first large-scale, real-scene dynamic camera intrinsics benchmark, covering diverse indoor and outdoor environments and providing 143K high-resolution video frames with precise, frame-level intrinsic ground truth. We further propose an enhanced Kalibr-based calibration framework incorporating a lookup-table strategy to achieve robust, high-accuracy per-frame intrinsic estimation. Extensive experiments reveal that state-of-the-art intrinsic prediction methods suffer substantial performance degradation on DCI-Bench, exposing critical generalization limitations under temporal intrinsic variation. This work establishes the first authoritative evaluation standard for dynamic intrinsics, enabling rigorous assessment of time-varying camera modeling and highlighting key challenges for future research.
📝 Abstract
Accurately tracking camera intrinsics is crucial for achieving 3D understanding from 2D video. However, most 3D algorithms assume that camera intrinsics stay constant throughout a video, which is often not true for many real-world in-the-wild videos. A major obstacle in this field is a lack of dynamic camera intrinsics benchmarks--existing benchmarks typically offer limited diversity in scene content and intrinsics variation, and none provide per-frame intrinsic changes for consecutive video frames. In this paper, we present Intrinsics in Flux (InFlux), a real-world benchmark that provides per-frame ground truth intrinsics annotations for videos with dynamic intrinsics. Compared to prior benchmarks, InFlux captures a wider range of intrinsic variations and scene diversity, featuring 143K+ annotated frames from 386 high-resolution indoor and outdoor videos with dynamic camera intrinsics. To ensure accurate per-frame intrinsics, we build a comprehensive lookup table of calibration experiments and extend the Kalibr toolbox to improve its accuracy and robustness. Using our benchmark, we evaluate existing baseline methods for predicting camera intrinsics and find that most struggle to achieve accurate predictions on videos with dynamic intrinsics. For the dataset, code, videos, and submission, please visit https://influx.cs.princeton.edu/.