🤖 AI Summary
This work addresses the challenge that traditional discrete probability distributions rely on manually derived analytical forms, hindering the automatic discovery of interpretable models. We propose Symbolic Density Estimation (SDE), a novel framework that, for the first time, integrates structural priors, evolutionary search, and validity-aware parameter inference to automatically discover closed-form probability mass functions within a structured symbolic space composed of elementary mathematical operations. SDE accommodates complex distributional features such as zero-inflation and finite mixtures. We introduce the first systematic benchmark dataset for this task and demonstrate that SDE accurately recovers all target distribution families. On real-world data, SDE discovers concise, interpretable mixture models that achieve superior goodness-of-fit compared to standard methods.
📝 Abstract
Discrete probability laws underpin statistical modeling, yet the catalog of interpretable distributions has expanded only gradually through centuries of case-by-case mathematical derivations. We introduce symbolic density estimation (SDE), an unsupervised framework that automatically recovers closed-form probability mass functions by composing elementary analytic operations within a structured search space. Our method integrates domain-specific structural priors with evolutionary search and a validity-aware inference stage, and it extends to richer distribution families such as zero inflation and finite mixtures. To support systematic evaluation and future research, we contribute a benchmark dataset spanning a broad collection of commonly used discrete distributions. The proposed algorithm recovers all benchmark families with accurate parameter estimates. A real data application shows that it identifies concise and interpretable mixture models that improve goodness-of-fit over standard models.