🤖 AI Summary
This work addresses the challenging problem of camera geometric calibration under uncalibrated, unpreprocessed outdoor images exhibiting complex optical distortions. We propose a physics-driven, end-to-end ray-based calibration method. Our key contributions are: (1) the first integration of diffusion models into physically consistent ray alignment—eschewing semantic features in favor of explicit geometric prior modeling; (2) an edge-aware attention mechanism that enhances sensitivity to distortion boundaries; and (3) a large-scale ray-tracing distortion prior database comprising over 3,000 real-world lens parameter configurations. Evaluated on real-world datasets, our method reduces ray bundle angular error by 8.2°, substantially outperforming existing unsupervised and weakly supervised calibration approaches. The framework establishes a new paradigm for high-precision 3D perception in open, unconstrained environments.
📝 Abstract
Accurate camera calibration is a fundamental task for 3D perception, especially when dealing with real-world, in-the-wild environments where complex optical distortions are common. Existing methods often rely on pre-rectified images or calibration patterns, which limits their applicability and flexibility. In this work, we introduce a novel framework that addresses these challenges by jointly modeling camera intrinsic and extrinsic parameters using a generic ray camera model. Unlike previous approaches, AlignDiff shifts focus from semantic to geometric features, enabling more accurate modeling of local distortions. We propose AlignDiff, a diffusion model conditioned on geometric priors, enabling the simultaneous estimation of camera distortions and scene geometry. To enhance distortion prediction, we incorporate edge-aware attention, focusing the model on geometric features around image edges, rather than semantic content. Furthermore, to enhance generalizability to real-world captures, we incorporate a large database of ray-traced lenses containing over three thousand samples. This database characterizes the distortion inherent in a diverse variety of lens forms. Our experiments demonstrate that the proposed method significantly reduces the angular error of estimated ray bundles by ~8.2 degrees and overall calibration accuracy, outperforming existing approaches on challenging, real-world datasets.