Sufficient digits and density estimation: A Bayesian nonparametric approach using generalized finite P'olya trees

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
提出基于广义有限Pólya树的贝叶斯非参数方法,通过数字表征解决连续随机变量密度估计问题,避免MCMC计算,并验证了后验一致性。

Technology Category

Application Category

📝 Abstract
This paper proposes a novel approach for statistical modelling of a continuous random variable $X$ on $[0, 1)$, based on its digit representation $X=.X_1X_2ldots$. In general, $X$ can be coupled with a random variable $N$ so that if a prior of $N$ is imposed, $(X_1,ldots,X_N)$ becomes a sufficient statistics and $.X_{N+1}X_{N+2}ldots$ is uniformly distributed. In line with this fact, and focusing on binary digits for simplicity, we propose a family of generalized finite P{'o}lya trees that induces a random density for a sample, which becomes a flexible tool for density estimation. Here, the digit system may be random and learned from the data. We provide a detailed Bayesian analysis, including closed form expression for the posterior distribution which sidesteps the need of MCMC methods for posterior inference. We analyse the frequentist properties as the sample size increases, and provide sufficient conditions for consistency of the posterior distributions of the random density and $N$. We consider an extension to data spanning multiple orders of magnitude, and propose a prior distribution that encodes the so-called extended Newcomb-Benford law. Such a model shows promising results for density estimation of human-activity data. Our methodology is illustrated on several synthetic and real datasets.
Problem

Research questions and friction points this paper is trying to address.

Estimating continuous variable density via digit representation
Developing Bayesian nonparametric method using Pólya trees
Ensuring posterior consistency for random density and digit length
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian nonparametric approach with Pólya trees
Closed-form posterior avoids MCMC methods
Learns digit system from data