Sparse Variational Student-t Processes

📅 2023-12-09
🏛️ AAAI Conference on Artificial Intelligence
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and poor scalability of Student-t processes (TPs) on large-scale heavy-tailed or outlier-contaminated data, this paper introduces the first sparse variational Student-t process (SVTP). Methodologically, we propose a conditional-distribution-based inducing point framework and develop two KL divergence approximation strategies—Monte Carlo sampling and Jensen’s inequality-based analytical approximation—to jointly ensure robustness and scalability. Leveraging sparse variational inference, Bayesian learning, and stochastic gradient optimization, SVTP significantly reduces time complexity from $O(N^3)$ to $O(M^2N)$, where $M ll N$. Experiments on multiple UCI and Kaggle benchmark datasets demonstrate that SVTP achieves lower computational overhead, higher predictive accuracy, and superior robustness to outliers compared to state-of-the-art baselines. This work establishes an efficient, scalable, and robust Bayesian nonparametric paradigm for modeling heavy-tailed data.
📝 Abstract
The theory of Bayesian learning incorporates the use of Student-t Processes to model heavy-tailed distributions and datasets with outliers. However, despite Student-t Processes having a similar computational complexity as Gaussian Processes, there has been limited emphasis on the sparse representation of this model. This is mainly due to the increased difficulty in modeling and computation compared to previous sparse Gaussian Processes. Our motivation is to address the need for a sparse representation framework that reduces computational complexity, allowing Student-t Processes to be more flexible for real-world datasets. To achieve this, we leverage the conditional distribution of Student-t Processes to introduce sparse inducing points. Bayesian methods and variational inference are then utilized to derive a well-defined lower bound, facilitating more efficient optimization of our model through stochastic gradient descent. We propose two methods for computing the variational lower bound, one utilizing Monte Carlo sampling and the other employing Jensen's inequality to compute the KL regularization term in the loss function. We propose adopting these approaches as viable alternatives to Gaussian processes when the data might contain outliers or exhibit heavy-tailed behavior, and we provide specific recommendations for their applicability. We evaluate the two proposed approaches on various synthetic and real-world datasets from UCI and Kaggle, demonstrating their effectiveness compared to baseline methods in terms of computational complexity and accuracy, as well as their robustness to outliers.
Problem

Research questions and friction points this paper is trying to address.

Develop sparse Student-t Processes for heavy-tailed data
Reduce computational complexity via variational inference
Enhance robustness to outliers in real-world datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse inducing points for Student-t Processes
Variational inference with Bayesian methods
Monte Carlo and Jensen's inequality for optimization
🔎 Similar Papers
No similar papers found.