🤖 AI Summary
Existing ROI-based image compression methods rely on pre-defined, fixed regions, limiting their adaptability to diverse user-specific semantic requirements and quality preferences. To address this, we propose a customizable, text-driven ROI compression framework: (1) semantic-level ROI masks are generated under textual guidance; (2) a tunable mask intensity mechanism enables dynamic rate-distortion trade-offs between ROI and non-ROI regions; and (3) a mask-aware attention module in the latent space refines feature representation. The framework supports end-to-end training without requiring manual ROI annotations. Experiments demonstrate that, across diverse user-provided textual descriptions, our method significantly improves ROI reconstruction fidelity while preserving overall compression efficiency—outperforming conventional fixed-ROI approaches across all metrics. To the best of our knowledge, this is the first work to achieve semantically controllable, quality-adjustable, personalized deep image compression.
📝 Abstract
Region of Interest (ROI)-based image compression optimizes bit allocation by prioritizing ROI for higher-quality reconstruction. However, as the users (including human clients and downstream machine tasks) become more diverse, ROI-based image compression needs to be customizable to support various preferences. For example, different users may define distinct ROI or require different quality trade-offs between ROI and non-ROI. Existing ROI-based image compression schemes predefine the ROI, making it unchangeable, and lack effective mechanisms to balance reconstruction quality between ROI and non-ROI. This work proposes a paradigm for customizable ROI-based deep image compression. First, we develop a Text-controlled Mask Acquisition (TMA) module, which allows users to easily customize their ROI for compression by just inputting the corresponding semantic emph{text}. It makes the encoder controlled by text. Second, we design a Customizable Value Assign (CVA) mechanism, which masks the non-ROI with a changeable extent decided by users instead of a constant one to manage the reconstruction quality trade-off between ROI and non-ROI. Finally, we present a Latent Mask Attention (LMA) module, where the latent spatial prior of the mask and the latent Rate-Distortion Optimization (RDO) prior of the image are extracted and fused in the latent space, and further used to optimize the latent representation of the source image. Experimental results demonstrate that our proposed customizable ROI-based deep image compression paradigm effectively addresses the needs of customization for ROI definition and mask acquisition as well as the reconstruction quality trade-off management between the ROI and non-ROI.