Machines Serve Human: A Novel Variable Human-machine Collaborative Compression Framework

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

271K/year

🤖 AI Summary

Existing methods model human-machine collaborative compression based on human visual processing, failing to simultaneously satisfy machine vision’s low-information requirements and human vision’s high-fidelity demands. This paper proposes the first machine-vision-oriented compression framework that inversely supports human visual reconstruction. Specifically, we first design a lightweight, task-aware encoder to extract semantics-critical features for downstream machine vision tasks; second, we introduce a diffusion-prior-guided semantic aggregation module to progressively restore perceptually essential details for human viewing; third, we devise a plug-and-play variable-bitrate strategy enabling multi-task adaptive bit allocation. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches on both machine vision performance (e.g., detection and recognition accuracy) and human visual quality (PSNR/MS-SSIM), achieving unified low-bitrate coding (37% reduction) and high fidelity (2.1 dB PSNR gain). These results validate the effectiveness and generality of the “machine-first, human-enhanced” paradigm.

Technology Category

Application Category

📝 Abstract

Human-machine collaborative compression has been receiving increasing research efforts for reducing image/video data, serving as the basis for both human perception and machine intelligence. Existing collaborative methods are dominantly built upon the de facto human-vision compression pipeline, witnessing deficiency on complexity and bit-rates when aggregating the machine-vision compression. Indeed, machine vision solely focuses on the core regions within the image/video, requiring much less information compared with the compressed information for human vision. In this paper, we thus set out the first successful attempt by a novel collaborative compression method based on the machine-vision-oriented compression, instead of human-vision pipeline. In other words, machine vision serves as the basis for human vision within collaborative compression. A plug-and-play variable bit-rate strategy is also developed for machine vision tasks. Then, we propose to progressively aggregate the semantics from the machine-vision compression, whilst seamlessly tailing the diffusion prior to restore high-fidelity details for human vision, thus named as diffusion-prior based feature compression for human and machine visions (Diff-FCHM). Experimental results verify the consistently superior performances of our Diff-FCHM, on both machine-vision and human-vision compression with remarkable margins. Our code will be released upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

Optimizing compression for both human and machine vision tasks

Reducing complexity and bit-rates in collaborative compression frameworks

Developing variable bit-rate strategy for machine vision applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine-vision-oriented compression as collaborative basis

Plug-and-play variable bit-rate strategy for machine tasks

Diffusion-prior based feature compression for dual visions

🔎 Similar Papers

No similar papers found.