Fast Motion Estimation and Context-Aware Refinement for Efficient Bayer-Domain Video Vision

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Video vision systems suffer from low efficiency due to high inter-frame temporal redundancy and substantial computational overhead from conventional RGB conversion and full ISP pipelines. To address this, we propose the first end-to-end video processing framework operating directly in the Bayer domain—bypassing RGB demosaicing and the entire ISP pipeline—and performing lightweight motion estimation on raw sensor data. Our method introduces a block-matching-based fast motion vector (MV) prediction module, coupled with a context-aware error correction network for MV refinement. Additionally, we integrate an adaptive keyframe selection strategy to dynamically suppress redundant computations. Evaluated on action recognition and video object detection tasks, our approach achieves an average 2.3× speedup with negligible accuracy degradation (<0.5% mAP/Top-1 drop), significantly reducing both frontend computational cost and temporal redundancy overhead.

Technology Category

Application Category

📝 Abstract
The efficiency of video computer vision system remains a challenging task due to the high temporal redundancy inside a video. Existing works have been proposed for efficient vision computer vision. However, they do not fully reduce the temporal redundancy and neglect the front end computation overhead. In this paper, we propose an efficient video computer vision system. First, image signal processor is removed and Bayer-format data is directly fed into video computer vision models, thus saving the front end computation. Second, instead of optical flow models and video codecs, a fast block matching-based motion estimation algorithm is proposed specifically for efficient video computer vision, with a MV refinement module. To correct the error, context-aware block refinement network is introduced to refine regions with large error. To further balance the accuracy and efficiency, a frame selection strategy is employed. Experiments on multiple video computer vision tasks demonstrate that our method achieves significant acceleration with slight performance loss.
Problem

Research questions and friction points this paper is trying to address.

Reducing temporal redundancy in video computer vision systems
Eliminating front-end computation overhead from image processing
Balancing accuracy and efficiency with motion estimation refinement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayer-format direct input processing
Fast block matching motion estimation
Context-aware refinement network correction
🔎 Similar Papers
No similar papers found.
H
Haichao Wang
Shenzhen International Graduate School, Tsinghua University
X
Xinyue Xi
Shenzhen International Graduate School, Tsinghua University
Jiangtao Wen
Jiangtao Wen
NYU
Yuxing Han
Yuxing Han
Tsinghua University
Smart AgricultureArtificial IntelligenceVideoCommunication