Adaptive Kernel Selection for Stein Variational Gradient Descent

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
SVGD suffers from fixed kernel parameters—particularly bandwidth—leading to suboptimal performance in Bayesian inference; the median heuristic further degrades in high dimensions. To address this, we propose an adaptive kernel parameter optimization framework: within a reproducing kernel Hilbert space (RKHS), the kernel bandwidth is treated as a learnable parameter and optimized via gradient ascent to maximize the kernelized Stein discrepancy (KSD), alternating with SVGD particle updates. This approach abandons the conventional fixed-kernel assumption while retaining theoretical guarantees and computational tractability, including a convergence analysis. Experiments across diverse Bayesian inference tasks demonstrate substantial improvements in both convergence speed and posterior approximation accuracy, consistently outperforming baseline methods such as the median heuristic.

Technology Category

Application Category

📝 Abstract
A central challenge in Bayesian inference is efficiently approximating posterior distributions. Stein Variational Gradient Descent (SVGD) is a popular variational inference method which transports a set of particles to approximate a target distribution. The SVGD dynamics are governed by a reproducing kernel Hilbert space (RKHS) and are highly sensitive to the choice of the kernel function, which directly influences both convergence and approximation quality. The commonly used median heuristic offers a simple approach for setting kernel bandwidths but lacks flexibility and often performs poorly, particularly in high-dimensional settings. In this work, we propose an alternative strategy for adaptively choosing kernel parameters over an abstract family of kernels. Recent convergence analyses based on the kernelized Stein discrepancy (KSD) suggest that optimizing the kernel parameters by maximizing the KSD can improve performance. Building on this insight, we introduce Adaptive SVGD (Ad-SVGD), a method that alternates between updating the particles via SVGD and adaptively tuning kernel bandwidths through gradient ascent on the KSD. We provide a simplified theoretical analysis that extends existing results on minimizing the KSD for fixed kernels to our adaptive setting, showing convergence properties for the maximal KSD over our kernel class. Our empirical results further support this intuition: Ad-SVGD consistently outperforms standard heuristics in a variety of tasks.
Problem

Research questions and friction points this paper is trying to address.

Adaptively selecting kernel parameters for SVGD
Improving convergence and approximation quality
Overcoming limitations of median heuristic bandwidth
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive kernel selection for SVGD
Optimizing kernel parameters via KSD
Alternating particle updates with bandwidth tuning
🔎 Similar Papers
No similar papers found.