A Unified MDL-based Binning and Tensor Factorization Framework for PDF Estimation

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional uniform histogram-based density estimation suffers from poor local adaptability and non-differentiable density estimates—particularly under multimodal or non-uniform distributions. To address these limitations, this paper proposes a nonparametric density estimation framework integrating the Minimum Description Length (MDL) principle with tensor decomposition. Our key contributions are: (1) a novel quantile-driven MDL-optimal adaptive binning strategy that enhances local resolution; and (2) construction of a high-order joint probability tensor, coupled with the first application of CANDECOMP/PARAFAC Decomposition (CPD) for globally smooth, differentiable density modeling. The resulting estimator is both locally adaptive and continuously differentiable, enabling gradient-based optimization. Experiments on synthetic benchmarks and a real-world dry bean classification dataset demonstrate substantial improvements in density estimation accuracy, as well as enhanced performance in downstream tasks—including clustering and nonparametric discriminant analysis.

Technology Category

Application Category

📝 Abstract
Reliable density estimation is fundamental for numerous applications in statistics and machine learning. In many practical scenarios, data are best modeled as mixtures of component densities that capture complex and multimodal patterns. However, conventional density estimators based on uniform histograms often fail to capture local variations, especially when the underlying distribution is highly nonuniform. Furthermore, the inherent discontinuity of histograms poses challenges for tasks requiring smooth derivatives, such as gradient-based optimization, clustering, and nonparametric discriminant analysis. In this work, we present a novel non-parametric approach for multivariate probability density function (PDF) estimation that utilizes minimum description length (MDL)-based binning with quantile cuts. Our approach builds upon tensor factorization techniques, leveraging the canonical polyadic decomposition (CPD) of a joint probability tensor. We demonstrate the effectiveness of our method on synthetic data and a challenging real dry bean classification dataset.
Problem

Research questions and friction points this paper is trying to address.

Estimating multimodal densities in nonuniform distributions
Overcoming histogram discontinuity for smooth derivative tasks
Providing reliable multivariate PDF estimation via MDL and CPD
Innovation

Methods, ideas, or system contributions that make the work stand out.

MDL-based binning with quantile cuts
Tensor factorization for PDF estimation
Canonical polyadic decomposition (CPD) technique
🔎 Similar Papers
No similar papers found.
M
Mustafa Musab
Communications Research Laboratory, Ilmenau University of Technology, Ilmenau, Germany
J
Joseph K. Chege
Communications Research Laboratory, Ilmenau University of Technology, Ilmenau, Germany
Arie Yeredor
Arie Yeredor
Professor, School of Electrical Engineering, Tel-Aviv University
Statistical Signal ProcessingEstimation TheoryBlind Signal Processing
Martin Haardt
Martin Haardt
University Professor, TU Ilmenau
statistical signal processingwireless communicationsarray signal processingtensor decompositionMIMO