Multi-task Just Recognizable Difference for Video Coding for Machines: Database, Model, and Coding Application

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing Just Recognizable Difference (JRD) methods are limited to single-task scenarios and cannot adequately support the demands of multi-task video coding in machine vision. This work addresses this gap by introducing the first multi-task JRD dataset encompassing object detection, instance segmentation, and keypoint localization. Furthermore, we propose the attribute-assisted AMT-JRD model, which jointly predicts object-level JRD through a unified architecture comprising general and specific feature extraction modules (GFEM/SFEM) and an attribute feature fusion module (AFFM). Experimental results demonstrate that AMT-JRD achieves an average absolute error of 3.781 across the three tasks, outperforming the current best single-task models by 6.7%. When integrated into the VCM framework, it yields BD-mAP gains of 3.861% and 7.886% over VVC and JPEG, respectively.

Technology Category

Application Category

📝 Abstract

Just Recognizable Difference (JRD) boosts coding efficiency for machine vision through visibility threshold modeling, but is currently limited to a single-task scenario. To address this issue, we propose a Multi-Task JRD (MT-JRD) dataset and an Attribute-assisted MT-JRD (AMT-JRD) model for Video Coding for Machines (VCM), enhancing both prediction accuracy and coding efficiency. First, we construct a dataset comprising 27,264 JRD annotations from machines, supporting three representative tasks including object detection, instance segmentation, and keypoint detection. Secondly, we propose the AMT-JRD prediction model, which integrates Generalized Feature Extraction Module (GFEM) and Specialized Feature Extraction Module (SFEM) to facilitate joint learning across multiple tasks. Thirdly, we innovatively incorporate object attribute information into object-wise JRD prediction through the Attribute Feature Fusion Module (AFFM), which introduces prior knowledge about object size and location. This design effectively compensates for the limitations of relying solely on image features and enhances the model's capacity to represent the perceptual mechanisms of machine vision. Finally, we apply the AMT-JRD model to VCM, where the accurately predicted JRDs are applied to reduce the coding bit rate while preserving accuracy across multiple machine vision tasks. Extensive experimental results demonstrate that AMT-JRD achieves precise and robust multi-task prediction with a mean absolute error of 3.781 and error variance of 5.332 across three tasks, outperforming the state-of-the-art single-task prediction model by 6.7% and 6.3%, respectively. Coding experiments further reveal that compared to the baseline VVC and JPEG, the AMT-JRD-based VCM improves an average of 3.861% and 7.886% Bjontegaard Delta-mean Average Precision (BD-mAP), respectively.

Problem

Research questions and friction points this paper is trying to address.

Just Recognizable Difference

Video Coding for Machines

Multi-task Learning

Machine Vision

Coding Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task Just Recognizable Difference

Video Coding for Machines

Attribute-assisted Prediction