Bandwidth Selectors on Semiparametric Bayesian Networks

📅 2025-06-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In semiparametric Bayesian networks (SPBNs), conventional bandwidth matrix selection for kernel density estimation—such as the normal-reference rule, which assumes Gaussianity—often yields biased density estimates and degraded predictive performance when applied to non-Gaussian real-world data. Method: This paper establishes the first theoretical framework for bandwidth selection tailored to SPBNs, systematically introducing and evaluating two data-driven approaches: unbiased cross-validation (UCV) and the plug-in method. Contribution/Results: Empirical evaluation demonstrates that UCV significantly outperforms the normal-reference rule in large-sample regimes, with its accuracy improving monotonically as sample size increases, whereas the normal-reference rule plateaus. To facilitate reproducibility and adoption, we extend the open-source PyBNesian library to support multiple bandwidth selectors. Our core contribution is a principled, adaptive, and computationally efficient bandwidth selection paradigm for SPBNs—enhancing robustness and accuracy in semiparametric probabilistic modeling.

Technology Category

Application Category

📝 Abstract
Semiparametric Bayesian networks (SPBNs) integrate parametric and non-parametric probabilistic models, offering flexibility in learning complex data distributions from samples. In particular, kernel density estimators (KDEs) are employed for the non-parametric component. Under the assumption of data normality, the normal rule is used to learn the bandwidth matrix for the KDEs in SPBNs. This matrix is the key hyperparameter that controls the trade-off between bias and variance. However, real-world data often deviates from normality, potentially leading to suboptimal density estimation and reduced predictive performance. This paper first establishes the theoretical framework for the application of state-of-the-art bandwidth selectors and subsequently evaluates their impact on SPBN performance. We explore the approaches of cross-validation and plug-in selectors, assessing their effectiveness in enhancing the learning capability and applicability of SPBNs. To support this investigation, we have extended the open-source package PyBNesian for SPBNs with the additional bandwidth selection techniques and conducted extensive experimental analyses. Our results demonstrate that the proposed bandwidth selectors leverage increasing information more effectively than the normal rule, which, despite its robustness, stagnates with more data. In particular, unbiased cross-validation generally outperforms the normal rule, highlighting its advantage in high sample size scenarios.
Problem

Research questions and friction points this paper is trying to address.

Optimizing bandwidth selection in SPBNs for non-normal data
Comparing cross-validation and plug-in bandwidth selector effectiveness
Enhancing SPBN performance with advanced bandwidth techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses kernel density estimators for non-parametric modeling
Applies cross-validation and plug-in bandwidth selectors
Extends PyBNesian with advanced bandwidth techniques
🔎 Similar Papers
No similar papers found.