🤖 AI Summary
To address visual inconsistencies—such as noise, motion blur, and depth-of-field mismatch—between virtual content and real-world video streams in augmented reality (AR), this paper proposes a camera-calibration-free real-time distortion modeling method. The approach leverages a differentiable image restoration framework that jointly integrates blind deconvolution (for motion deblurring), depth-guided denoising, and monocular depth estimation into an end-to-end self-calibration pipeline. Crucially, it employs gradient backpropagation to automatically optimize distortion parameters of black-box real-time renderers (e.g., Unity or Unreal Engine). This work introduces the novel “calibration-free self-calibration” paradigm, enabling millisecond-level distortion parameter estimation and significantly improving photorealistic alignment between virtual and real scenes. The method is plug-and-play, fully compatible with standard game engine rendering pipelines, and requires neither custom hardware nor domain-specific training data.
📝 Abstract
Real camera footage is subject to noise, motion blur (MB) and depth of field (DoF). In some applications these might be considered distortions to be removed, but in others it is important to model them because it would be ineffective, or interfere with an aesthetic choice, to simply remove them. In augmented reality applications where virtual content is composed into a live video feed, we can model noise, MB and DoF to make the virtual content visually consistent with the video. Existing methods for this typically suffer two main limitations. First, they require a camera calibration step to relate a known calibration target to the specific cameras response. Second, existing work require methods that can be (differentiably) tuned to the calibration, such as slow and specialized neural networks. We propose a method which estimates parameters for noise, MB and DoF instantly, which allows using off-the-shelf real-time simulation methods from e.g., a game engine in compositing augmented content. Our main idea is to unlock both features by showing how to use modern computer vision methods that can remove noise, MB and DoF from the video stream, essentially providing self-calibration. This allows to auto-tune any black-box real-time noise+MB+DoF method to deliver fast and high-fidelity augmentation consistency.