Sandwiched Compression: Repurposing Standard Codecs with Neural Network Wrappers

๐Ÿ“… 2024-02-08
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Standard image/video codecs struggle to adapt to emerging content types (e.g., HDR, synthetic graphics, cross-modal data) and perceptual distortion metrics. To address this, we propose the โ€œNeural Sandwichโ€ architecture: a differentiable proxy model that embeds a conventional codec between learnable neural preprocessing and postprocessing modules, enabling end-to-end optimization of rate-distortion trade-offs. We theoretically prove that, under a given distortion constraint, this architecture achieves the optimal rate-distortion bound. The framework supports multi-channel adaptation, super-resolution enhancement, and perceptual training using LPIPS and VMAF. Experiments demonstrate up to 9 dB PSNR gain and 30% bitrate reduction in non-standard scenarios, while consistently outperforming conventional adaptation methods across multiple perceptual quality metrics.

Technology Category

Application Category

๐Ÿ“ Abstract
We propose sandwiching standard image and video codecs between pre- and post-processing neural networks. The networks are jointly trained through a differentiable codec proxy to minimize a given rate-distortion loss. This sandwich architecture not only improves the standard codec's performance on its intended content, but more importantly, adapts the codec to other types of image/video content and to other distortion measures. The sandwich learns to transmit ``neural code images'' that optimize and improve overall rate-distortion performance, with the improvements becoming significant especially when the overall problem is well outside of the scope of the codec's design. We apply the sandwich architecture to standard codecs with mismatched sources transporting different numbers of channels, higher resolution, higher dynamic range, computer graphics, and with perceptual distortion measures. The results demonstrate substantial improvements (up to 9 dB gains or up to 30% bitrate reductions) compared to alternative adaptations. We establish optimality properties for sandwiched compression and design differentiable codec proxies approximating current standard codecs. We further analyze model complexity, visual quality under perceptual metrics, as well as sandwich configurations that offer interesting potentials in video compression and streaming.
Problem

Research questions and friction points this paper is trying to address.

Enhancing standard codecs with neural networks
Adapting codecs to varied image/video content
Optimizing rate-distortion performance with neural wrappers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural network wrappers enhance codecs
Differentiable codec proxy optimizes training
Sandwich architecture adapts to diverse content
๐Ÿ”Ž Similar Papers