Robust Multi-view Camera Calibration from Dense Matches

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

In multi-rigid-camera scenarios—such as animal behavior analysis and forensic video authentication—strong radial distortion severely degrades the robustness of multi-view camera calibration. To address this, we propose a joint intrinsic-extrinsic calibration method tailored for dense feature matching. Our approach integrates (1) a structure-from-motion (SfM) framework enhanced with VGGT-based feature matching, adaptive optimal subsampling of correspondences, and incremental view selection; and (2) a distortion-aware pose initialization and global optimization pipeline. Evaluated on strongly distorted datasets, our method achieves a calibration success rate of 79.9%, substantially outperforming the VGGT baseline (40.4%). It supports diverse camera configurations—including fisheye, wide-angle, and catadioptric systems—and demonstrates practical deployability in real-world applications.

Technology Category

Application Category

📝 Abstract

Estimating camera intrinsics and extrinsics is a fundamental problem in computer vision, and while advances in structure-from-motion (SfM) have improved accuracy and robustness, open challenges remain. In this paper, we introduce a robust method for pose estimation and calibration. We consider a set of rigid cameras, each observing the scene from a different perspective, which is a typical camera setup in animal behavior studies and forensic analysis of surveillance footage. Specifically, we analyse the individual components in a structure-from-motion (SfM) pipeline, and identify design choices that improve accuracy. Our main contributions are: (1) we investigate how to best subsample the predicted correspondences from a dense matcher to leverage them in the estimation process. (2) We investigate selection criteria for how to add the views incrementally. In a rigorous quantitative evaluation, we show the effectiveness of our changes, especially for cameras with strong radial distortion (79.9% ours vs. 40.4 vanilla VGGT). Finally, we demonstrate our correspondence subsampling in a global SfM setting where we initialize the poses using VGGT. The proposed pipeline generalizes across a wide range of camera setups, and could thus become a useful tool for animal behavior and forensic analysis.

Problem

Research questions and friction points this paper is trying to address.

Robust multi-view camera calibration from dense matches

Improving accuracy in structure-from-motion pipelines

Enhancing calibration for cameras with strong radial distortion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Robust multi-view camera calibration using dense matches

Improved correspondence subsampling for better pose estimation

Incremental view selection for enhanced calibration accuracy

🔎 Similar Papers

No similar papers found.