🤖 AI Summary
To address the joint optimization of compression efficiency and task accuracy for remote analysis of intermediate features in machine vision, this paper proposes a multi-scale feature importance-driven end-to-end bit allocation method. We first formulate the dynamic variation of feature importance across scales, object sizes, and image instances; then design a Multi-scale Feature Importance Prediction (MFIP) module and establish a differentiable joint objective integrating task loss and rate, enabling semantic-aware adaptive bit allocation. The method is compatible with mainstream learned image compression frameworks, including ELIC and LIC-TCM. Experiments demonstrate average bitrate reductions of 38.2%, 17.2%, and 36.5% on object detection, instance segmentation, and keypoint detection, respectively. On LIC-TCM, the method achieves over 18.5% average bitrate savings across all three tasks, confirming its strong generalizability and practical utility.
📝 Abstract
Feature Coding for Machines (FCM) aims to compress intermediate features effectively for remote intelligent analytics, which is crucial for future intelligent visual applications. In this paper, we propose a Multiscale Feature Importance-based Bit Allocation (MFIBA) for end-to-end FCM. First, we find that the importance of features for machine vision tasks varies with the scales, object size, and image instances. Based on this finding, we propose a Multiscale Feature Importance Prediction (MFIP) module to predict the importance weight for each scale of features. Secondly, we propose a task loss-rate model to establish the relationship between the task accuracy losses of using compressed features and the bitrate of encoding these features. Finally, we develop a MFIBA for end-to-end FCM, which is able to assign coding bits of multiscale features more reasonably based on their importance. Experimental results demonstrate that when combined with a retained Efficient Learned Image Compression (ELIC), the proposed MFIBA achieves an average of 38.202% bitrate savings in object detection compared to the anchor ELIC. Moreover, the proposed MFIBA achieves an average of 17.212% and 36.492% feature bitrate savings for instance segmentation and keypoint detection, respectively. When the proposed MFIBA is applied to the LIC-TCM, it achieves an average of 18.103%, 19.866% and 19.597% bit rate savings on three machine vision tasks, respectively, which validates the proposed MFIBA has good generalizability and adaptability to different machine vision tasks and FCM base codecs.