A Preprocessing Framework for Video Machine Vision under Compression

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Existing video compression methods are primarily optimized for human visual perception and thus often fail to preserve semantic information critical for machine vision tasks. To address this, this paper proposes a machine-vision-oriented neural preprocessing framework. It introduces a learnable preprocessor prior to standard video encoding and pioneers a differentiable virtual codec, enabling end-to-end joint optimization of preprocessing and conventional encoders (e.g., H.264/AVC) without modifying codec standards. A rate–distortion–task loss jointly optimizes bit rate, reconstruction fidelity, and downstream task performance—including object detection and action recognition. Experiments demonstrate that the framework reduces average bit rate by over 15% while maintaining or even improving task accuracy, significantly enhancing semantic fidelity and utility of compressed video for machine vision applications.

Technology Category

Application Category

📝 Abstract

There has been a growing trend in compressing and transmitting videos from terminals for machine vision tasks. Nevertheless, most video coding optimization method focus on minimizing distortion according to human perceptual metrics, overlooking the heightened demands posed by machine vision systems. In this paper, we propose a video preprocessing framework tailored for machine vision tasks to address this challenge. The proposed method incorporates a neural preprocessor which retaining crucial information for subsequent tasks, resulting in the boosting of rate-accuracy performance. We further introduce a differentiable virtual codec to provide constraints on rate and distortion during the training stage. We directly apply widely used standard codecs for testing. Therefore, our solution can be easily applied to real-world scenarios. We conducted extensive experiments evaluating our compression method on two typical downstream tasks with various backbone networks. The experimental results indicate that our approach can save over 15% of bitrate compared to using only the standard codec anchor version.

Problem

Research questions and friction points this paper is trying to address.

Optimizes video compression for machine vision tasks

Enhances rate-accuracy performance over human-centric metrics

Saves bitrate with a practical preprocessing framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural preprocessor retains crucial task information

Differentiable virtual codec constrains rate and distortion

Framework boosts rate-accuracy performance for machine vision

🔎 Similar Papers

SMC++: Masked Learning of Unsupervised Video Semantic Compression