Machines Serve Human: A Novel Variable Human-machine Collaborative Compression Framework

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods model human-machine collaborative compression based on human visual processing, failing to simultaneously satisfy machine vision’s low-information requirements and human vision’s high-fidelity demands. This paper proposes the first machine-vision-oriented compression framework that inversely supports human visual reconstruction. Specifically, we first design a lightweight, task-aware encoder to extract semantics-critical features for downstream machine vision tasks; second, we introduce a diffusion-prior-guided semantic aggregation module to progressively restore perceptually essential details for human viewing; third, we devise a plug-and-play variable-bitrate strategy enabling multi-task adaptive bit allocation. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches on both machine vision performance (e.g., detection and recognition accuracy) and human visual quality (PSNR/MS-SSIM), achieving unified low-bitrate coding (37% reduction) and high fidelity (2.1 dB PSNR gain). These results validate the effectiveness and generality of the “machine-first, human-enhanced” paradigm.

Technology Category

Application Category

📝 Abstract
Human-machine collaborative compression has been receiving increasing research efforts for reducing image/video data, serving as the basis for both human perception and machine intelligence. Existing collaborative methods are dominantly built upon the de facto human-vision compression pipeline, witnessing deficiency on complexity and bit-rates when aggregating the machine-vision compression. Indeed, machine vision solely focuses on the core regions within the image/video, requiring much less information compared with the compressed information for human vision. In this paper, we thus set out the first successful attempt by a novel collaborative compression method based on the machine-vision-oriented compression, instead of human-vision pipeline. In other words, machine vision serves as the basis for human vision within collaborative compression. A plug-and-play variable bit-rate strategy is also developed for machine vision tasks. Then, we propose to progressively aggregate the semantics from the machine-vision compression, whilst seamlessly tailing the diffusion prior to restore high-fidelity details for human vision, thus named as diffusion-prior based feature compression for human and machine visions (Diff-FCHM). Experimental results verify the consistently superior performances of our Diff-FCHM, on both machine-vision and human-vision compression with remarkable margins. Our code will be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

Optimizing compression for both human and machine vision tasks
Reducing complexity and bit-rates in collaborative compression frameworks
Developing variable bit-rate strategy for machine vision applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine-vision-oriented compression as collaborative basis
Plug-and-play variable bit-rate strategy for machine tasks
Diffusion-prior based feature compression for dual visions
🔎 Similar Papers
No similar papers found.
Z
Zifu Zhang
School of Electronic and Information Engineering, Beihang University, Beijing 100191, China
S
Shengxi Li
School of Electronic and Information Engineering, Beihang University, Beijing 100191, China; State Key Laboratory of Virtual Reality Technology and Systems, Beihang University
X
Xiancheng Sun
School of Electronic and Information Engineering, Beihang University, Beijing 100191, China
Mai Xu
Mai Xu
Beihang Univeristy, Tsinghua Univeristy, Imperial College London
Zhengyuan Liu
Zhengyuan Liu
Institute for Infocomm Research (I2R) - A*STAR; IEEE Senior Member.
Natural Language ProcessingArtificial IntelligenceHuman-Centered AI
Jingyuan Xia
Jingyuan Xia
National University of Defense Technology
Non-convex optimizationStatistical machine learningImage restoration