Sandwiched Compression: Repurposing Standard Codecs with Neural Network Wrappers

📅 2024-02-08

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Standard image/video codecs struggle to adapt to emerging content types (e.g., HDR, synthetic graphics, cross-modal data) and perceptual distortion metrics. To address this, we propose the “Neural Sandwich” architecture: a differentiable proxy model that embeds a conventional codec between learnable neural preprocessing and postprocessing modules, enabling end-to-end optimization of rate-distortion trade-offs. We theoretically prove that, under a given distortion constraint, this architecture achieves the optimal rate-distortion bound. The framework supports multi-channel adaptation, super-resolution enhancement, and perceptual training using LPIPS and VMAF. Experiments demonstrate up to 9 dB PSNR gain and 30% bitrate reduction in non-standard scenarios, while consistently outperforming conventional adaptation methods across multiple perceptual quality metrics.

Technology Category

Application Category

📝 Abstract

We propose sandwiching standard image and video codecs between pre- and post-processing neural networks. The networks are jointly trained through a differentiable codec proxy to minimize a given rate-distortion loss. This sandwich architecture not only improves the standard codec's performance on its intended content, but more importantly, adapts the codec to other types of image/video content and to other distortion measures. The sandwich learns to transmit ``neural code images'' that optimize and improve overall rate-distortion performance, with the improvements becoming significant especially when the overall problem is well outside of the scope of the codec's design. We apply the sandwich architecture to standard codecs with mismatched sources transporting different numbers of channels, higher resolution, higher dynamic range, computer graphics, and with perceptual distortion measures. The results demonstrate substantial improvements (up to 9 dB gains or up to 30% bitrate reductions) compared to alternative adaptations. We establish optimality properties for sandwiched compression and design differentiable codec proxies approximating current standard codecs. We further analyze model complexity, visual quality under perceptual metrics, as well as sandwich configurations that offer interesting potentials in video compression and streaming.

Problem

Research questions and friction points this paper is trying to address.

Enhancing standard codecs with neural networks

Adapting codecs to varied image/video content

Optimizing rate-distortion performance with neural wrappers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural network wrappers enhance codecs

Differentiable codec proxy optimizes training

Sandwich architecture adapts to diverse content

🔎 Similar Papers

Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data