๐ค AI Summary
Traditional transform coding struggles to achieve rate-distortion optimality for multimodal Gaussian mixture sources due to its inability to capture heterogeneous local geometric structures. This work establishes the rate-distortion theory for Gaussian mixture sources, revealing that their conditional rate-distortion function is governed by a globally uniform reverse water-filling threshold. Building on this insight, the authors propose PrismQuant, a framework that losslessly transmits component labels and employs component-adaptive KLT followed by entropy-constrained scalar quantization for residual coding. By transmitting only the component labels, PrismQuant closely approaches the theoretical rate-distortion limitโachieving near-optimal performance on synthetic data and significantly outperforming Transformer-based codecs on real-world CSI data, despite using a model an order of magnitude smaller.
๐ Abstract
For a Gaussian source under mean-squared error (MSE), classical transform coding is rate--distortion (RD) optimal: the Karhunen--Loeve transform (KLT) diagonalizes the covariance, reverse waterfilling allocates the bits, and scalar quantization closes the loop. This elegant story breaks down for multimodal sources, where no single covariance can capture heterogeneous local geometries, and the RD function loses its closed form. We revisit this problem through Gaussian-mixture sources and develop a constructive RD theory for them. Our key finding is that the mixture structure incurs only a component label cost. Conditioned on the active mixture component, each branch is Gaussian; the challenge is allocating bits across heterogeneous branches. We prove that the genie-aided conditional RD function is governed by a single global reverse-waterfilling level shared across all components and eigenmodes. Building on this result, we introduce PrismQuant, which transmits the component label losslessly and encodes the residual using the component-matched KLT, followed by scalar quantization, achieving a rate of H(C)/n bits per source dimension of the converse, with a vanishing asymptotic gap. We further develop a practical implementation based on EM-driven Gaussian-mixture learning, component-adaptive KLTs, and entropy-constrained scalar quantization (ECSQ). Experiments on synthetic Gaussian mixtures show that PrismQuant closely approaches the theoretical RD bound, while experiments on real-world channel-state-information (CSI) data demonstrate competitive or superior performance compared with transformer-based learned codecs at more than one order of magnitude smaller model size.