On Disentangled Training for Nonlinear Transform in Learned Image Compression

📅 2025-01-23

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

In learned image compression (LIC), nonlinear transform training is notoriously slow—often exceeding two weeks—and hindered by energy concentration and feature coupling. To address this, we propose a novel decoupled energy compaction training paradigm. Our method decomposes energy compaction into two independently modeled components: feature decorrelation and non-uniform energy modulation. We further introduce a lightweight linear auxiliary transform (AuxT) and a wave-like linear shortcut (WLS), enabling end-to-end differentiability and plug-and-play acceleration. These modules are jointly optimized with the backbone network via wavelet-based downsampling, orthogonal projection, and subband-aware scaling. Experiments demonstrate a 2× speedup in training time, an average 1% BD-rate reduction, and maintained or improved rate-distortion performance compared to state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

Learned image compression (LIC) has demonstrated superior rate-distortion (R-D) performance compared to traditional codecs, but is challenged by training inefficiency that could incur more than two weeks to train a state-of-the-art model from scratch. Existing LIC methods overlook the slow convergence caused by compacting energy in learning nonlinear transforms. In this paper, we first reveal that such energy compaction consists of two components, i.e., feature decorrelation and uneven energy modulation. On such basis, we propose a linear auxiliary transform (AuxT) to disentangle energy compaction in training nonlinear transforms. The proposed AuxT obtains coarse approximation to achieve efficient energy compaction such that distribution fitting with the nonlinear transforms can be simplified to fine details. We then develop wavelet-based linear shortcuts (WLSs) for AuxT that leverages wavelet-based downsampling and orthogonal linear projection for feature decorrelation and subband-aware scaling for uneven energy modulation. AuxT is lightweight and plug-and-play to be integrated into diverse LIC models to address the slow convergence issue. Experimental results demonstrate that the proposed approach can accelerate training of LIC models by 2 times and simultaneously achieves an average 1% BD-rate reduction. To our best knowledge, this is one of the first successful attempt that can significantly improve the convergence of LIC with comparable or superior rate-distortion performance. Code will be released at url{https://github.com/qingshi9974/AuxT}

Problem

Research questions and friction points this paper is trying to address.

Learning-based Image Compression

Training Efficiency

Energy Concentration Issue

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear Auxiliary Transform (AuxT)

Learning-based Image Compression (LIC)

Training Acceleration

🔎 Similar Papers

No similar papers found.