ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation

📅 2026-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of large-baseline novel view synthesis from only two input images, where existing methods struggle to reconstruct occluded regions and often deviate from the prescribed camera trajectory. To overcome these limitations, we propose ConfCtrl, a framework that leverages a confidence-aware interpolation mechanism to guide a diffusion model in strictly adhering to the target camera pose while simultaneously generating missing content. Our approach integrates confidence-weighted point cloud projections with a Kalman-like prediction-update strategy to dynamically balance geometric observations against pose-driven predictions. Additionally, it employs noise latent initialization combined with learned residual correction to enhance geometric consistency and generation stability. Experiments demonstrate that our method achieves visually plausible and geometrically coherent large-baseline view synthesis across multiple datasets, effectively reconstructing occluded regions.

Technology Category

Application Category

📝 Abstract
We address the challenge of novel view synthesis from only two input images under large viewpoint changes. Existing regression-based methods lack the capacity to reconstruct unseen regions, while camera-guided diffusion models often deviate from intended trajectories due to noisy point cloud projections or insufficient conditioning from camera poses. To address these issues, we propose ConfCtrl, a confidence-aware video interpolation framework that enables diffusion models to follow prescribed camera poses while completing unseen regions. ConfCtrl initializes the diffusion process by combining a confidence-weighted projected point cloud latent with noise as the conditioning input. It then applies a Kalman-inspired predict-update mechanism, treating the projected point cloud as a noisy measurement and using learned residual corrections to balance pose-driven predictions with noisy geometric observations. This allows the model to rely on reliable projections while down-weighting uncertain regions, yielding stable, geometry-aware generation. Experiments on multiple datasets show that ConfCtrl produces geometrically consistent and visually plausible novel views, effectively reconstructing occluded regions under large viewpoint changes.
Problem

Research questions and friction points this paper is trying to address.

novel view synthesis
camera control
large viewpoint changes
occluded region reconstruction
video diffusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

confidence-aware interpolation
video diffusion
camera control
novel view synthesis
Kalman-inspired mechanism
🔎 Similar Papers
No similar papers found.
L
Liudi Yang
University of Freiburg
George Eskandar
George Eskandar
University of Stuttgart
Computer VisionDomain AdaptationGenerative AIAutonomous Driving3D Reconsruction
F
Fengyi Shen
Technical University of Munich
M
Mohammad Altillawi
Huawei Heisenberg Research Center (Munich)
Y
Yang Bai
Ludwig Maximilian University of Munich
C
Chi Zhang
Huawei Heisenberg Research Center (Munich)
Ziyuan Liu
Ziyuan Liu
Unknown affiliation
RoboticsManipulation and GraspingComputer VisionMachine Learning
Abhinav Valada
Abhinav Valada
Professor & Director of Robot Learning Lab, University of Freiburg
RoboticsMachine LearningComputer VisionArtificial Intelligence