🤖 AI Summary
This work proposes the AI-Kolmogorov framework, which introduces symbolic regression systematically into probability density estimation to address the Symbolic Density Estimation (SymDE) problem. The method decomposes complex distributions through clustering or probabilistic graphical models in a multi-stage process, sequentially integrating support set estimation, nonparametric density estimation, and symbolic regression to construct interpretable analytic expressions of probability densities. Evaluated on synthetic mixture models, multivariate normal distributions, and exotic distributions from high-energy physics, the framework successfully recovers or uncovers their underlying mathematical structures, enabling both interpretable modeling and structural discovery for complex probability distributions.
📝 Abstract
We introduce AI-Kolmogorov, a novel framework for Symbolic Density Estimation (SymDE). Symbolic regression (SR) has been effectively used to produce interpretable models in standard regression settings but its applicability to density estimation tasks has largely been unexplored. To address the SymDE task we introduce a multi-stage pipeline: (i) problem decomposition through clustering and/or probabilistic graphical model structure learning; (ii) nonparametric density estimation; (iii) support estimation; and finally (iv) SR on the density estimate. We demonstrate the efficacy of AI-Kolmogorov on synthetic mixture models, multivariate normal distributions, and three exotic distributions, two of which are motivated by applications in high-energy physics. We show that AI-Kolmogorov can discover underlying distributions or otherwise provide valuable insight into the mathematical expressions describing them.