🤖 AI Summary
This work addresses the long-standing challenge of computing the differential entropy of mixture distributions, which generally lacks a closed-form solution due to the coupling of logarithmic and summation operations. From an information-theoretic channel perspective, the authors decompose the entropy into within-component uncertainty and inter-component overlap terms, proposing the first deterministic closed-form approximation framework applicable to general mixture distributions. The method constructs approximations via pairwise overlap integrals, corrects the bias inherent in the Jensen bound, and incorporates a clipping mechanism to preserve theoretical consistency with known entropy bounds. Extensive validation across Gaussian, Laplacian, and uniform mixtures demonstrates high accuracy under diverse conditions—including varying degrees of component separation, dimensionality, number of components, and covariance structures—maintaining precision even in extreme regimes of complete overlap or near-perfect separation.
📝 Abstract
Mixture distributions are a workhorse model for multimodal data in information theory, signal processing, and machine learning. Yet even when each component density is simple, the differential entropy of the mixture is notoriously hard to compute because the mixture couples a logarithm with a sum. This paper develops a deterministic, closed-form toolkit for bounding and accurately approximating mixture entropy directly from component parameters. Our starting point is an information-theoretic channel viewpoint: the latent mixture label plays the role of an input, and the observation is the output. This viewpoint separates mixture entropy into an average within-component uncertainty plus an overlap term that quantifies how much the observation reveals about the hidden label. We then bound and approximate this overlap term using pairwise overlap integrals between component densities, yielding explicit expressions whenever these overlaps admit a closed form. A simple, family-dependent offset corrects the systematic bias of the Jensen overlap bound and is calibrated to be exact in the two limiting regimes of complete overlap and near-perfect separation. A final clipping step guarantees that the estimate always respects universal information-theoretic bounds. Closed-form specializations are provided for Gaussian, factorized Laplacian, uniform, and hybrid mixtures, and numerical experiments validate the resulting bounds and approximations across separation, dimension, number of components, and correlated covariances.