Trajectory Densification and Depth from Perspective-based Blur

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Perspective-dependent blur induced by camera rotation during handheld capture exhibits depth-varying blur kernels, posing a fundamental challenge for monocular depth estimation. Method: We propose a blur-pattern–aware monocular depth estimation framework that jointly models the spatial distribution of motion blur and camera trajectory in video sequences. Leveraging sliding-window embedding and multi-window aggregation, we employ vision-language models to densely interpolate sparse point trajectories obtained via point tracking, thereby enhancing the fidelity of depth–blur mapping. Contribution/Results: To our knowledge, this is the first work to integrate the depth-dependent nature of perspective blur with vision-language priors, enabling metric-scale depth prediction and high-fidelity trajectory reconstruction without stabilization hardware. Extensive evaluation on multiple standard depth benchmarks demonstrates significant improvements over state-of-the-art unsupervised and self-supervised methods—achieving broader depth range coverage, superior generalization, and up to 32% reduction in trajectory reconstruction error.

Technology Category

Application Category

📝 Abstract
In the absence of a mechanical stabilizer, the camera undergoes inevitable rotational dynamics during capturing, which induces perspective-based blur especially under long-exposure scenarios. From an optical standpoint, perspective-based blur is depth-position-dependent: objects residing at distinct spatial locations incur different blur levels even under the same imaging settings. Inspired by this, we propose a novel method that estimate metric depth by examining the blur pattern of a video stream and dense trajectory via joint optical design algorithm. Specifically, we employ off-the-shelf vision encoder and point tracker to extract video information. Then, we estimate depth map via windowed embedding and multi-window aggregation, and densify the sparse trajectory from the optical algorithm using a vision-language model. Evaluations on multiple depth datasets demonstrate that our method attains strong performance over large depth range, while maintaining favorable generalization. Relative to the real trajectory in handheld shooting settings, our optical algorithm achieves superior precision and the dense reconstruction maintains strong accuracy.
Problem

Research questions and friction points this paper is trying to address.

Estimates metric depth from perspective-based blur patterns in videos
Densifies sparse trajectories using a vision-language model
Achieves accurate depth and trajectory reconstruction in handheld settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Estimates depth from perspective-based blur patterns
Densifies sparse trajectories using vision-language model
Employs windowed embedding and multi-window aggregation
🔎 Similar Papers
No similar papers found.
T
Tianchen Qiu
College of Optical Science and Engineering, Zhejiang University, Hangzhou, China
Qirun Zhang
Qirun Zhang
College of Optical Science and Engineering, Zhejiang University, Hangzhou, China
J
Jiajian He
College of Optical Science and Engineering, Zhejiang University, Hangzhou, China
Z
Zhengyue Zhuge
College of Optical Science and Engineering, Zhejiang University, Hangzhou, China
Jiahui Xu
Jiahui Xu
ETH Zurich
Electronic Design AutomationFormal Verification
Yueting Chen
Yueting Chen
Zhejiang University
Computational imagingImage processingOptial Engineering