Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization

πŸ“… 2025-04-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Image compression for machine vision tasks must jointly optimize downstream task performance and perceptual quality. This paper proposes a feature-fidelity-driven rate-distortion optimization (RDO) framework that jointly optimizes task accuracy and reconstruction fidelity at the encoder. Our key contributions are: (i) the first differentiable, block-level feature distortion metricβ€”Input-Dependent Squared Error (IDSE); (ii) low-overhead approximation via Jacobian sketching, enabling lossless integration of task-aware RDO into standard codecs (e.g., AVC) without decoder modifications; and (iii) Jacobian linearization via Taylor expansion combined with fused transform-domain IDSE and signal-squared-error (SSE) loss. Experiments demonstrate that, with zero decoding overhead, our method achieves up to 10% bitrate reduction over conventional SSE-based RDO at equal computer vision task accuracy, while increasing encoding complexity by only 7%.

Technology Category

Application Category

πŸ“ Abstract
Many images and videos are primarily processed by computer vision algorithms, involving only occasional human inspection. When this content requires compression before processing, e.g., in distributed applications, coding methods must optimize for both visual quality and downstream task performance. We first show that, given the features obtained from the original and the decoded images, an approach to reduce the effect of compression on a task loss is to perform rate-distortion optimization (RDO) using the distance between features as a distortion metric. However, optimizing directly such a rate-distortion trade-off requires an iterative workflow of encoding, decoding, and feature evaluation for each coding parameter, which is computationally impractical. We address this problem by simplifying the RDO formulation to make the distortion term computable using block-based encoders. We first apply Taylor's expansion to the feature extractor, recasting the feature distance as a quadratic metric with the Jacobian matrix of the neural network. Then, we replace the linearized metric with a block-wise approximation, which we call input-dependent squared error (IDSE). To reduce computational complexity, we approximate IDSE using Jacobian sketches. The resulting loss can be evaluated block-wise in the transform domain and combined with the sum of squared errors (SSE) to address both visual quality and computer vision performance. Simulations with AVC across multiple feature extractors and downstream neural networks show up to 10% bit-rate savings for the same computer vision accuracy compared to RDO based on SSE, with no decoder complexity overhead and just a 7% encoder complexity increase.
Problem

Research questions and friction points this paper is trying to address.

Optimize image compression for machine vision tasks
Reduce computational complexity in rate-distortion optimization
Balance visual quality and task performance efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature distance as distortion metric in RDO
Block-based IDSE approximation for efficiency
Jacobian sketches to reduce complexity
πŸ”Ž Similar Papers
No similar papers found.
S
Samuel Fern'andez-Menduina
Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, 90089, United States
E
Eduardo Pavez
Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, 90089, United States
Antonio Ortega
Antonio Ortega
Dean's Professor of Electrical and Computer Engineering, University of Southern California
Signal ProcessingGraph Signal Processing