Posterior Sampling of Probabilistic Word Embeddings

📅 2025-08-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Quantifying uncertainty in word embeddings is critical for reliable text inference, yet existing Bayesian methods—such as Hamiltonian Monte Carlo (HMC) and mean-field variational inference (MFVI)—suffer from poor scalability or restrictive assumptions. This paper proposes a scalable Bayesian framework: a Polya-Gamma data-augmented Gibbs sampler enabling, for the first time, efficient and exact posterior sampling over large-scale word embeddings; combined with Laplace approximation to mitigate non-identifiability. Theoretical analysis and empirical evaluation on Congressional Speech and MovieLens datasets demonstrate substantial improvements over maximum a posteriori (MAP) estimation: posterior means yield higher held-out likelihood—especially under small-sample regimes—and uncertainty estimates are significantly more accurate. Key contributions are: (1) the first fully Bayesian inference scheme for word embeddings applicable to large corpora; and (2) a novel uncertainty modeling paradigm that jointly ensures computational efficiency and statistical rigor.

Technology Category

Application Category

📝 Abstract
Quantifying uncertainty in word embeddings is crucial for reliable inference from textual data. However, existing Bayesian methods such as Hamiltonian Monte Carlo (HMC) and mean-field variational inference (MFVI) are either computationally infeasible for large data or rely on restrictive assumptions. We propose a scalable Gibbs sampler using Polya-Gamma augmentation as well as Laplace approximation and compare them with MFVI and HMC for word embeddings. In addition, we address non-identifiability in word embeddings. Our Gibbs sampler and HMC correctly estimate uncertainties, while MFVI does not, and Laplace approximation only does so on large sample sizes, as expected. Applying the Gibbs sampler to the US Congress and the Movielens datasets, we demonstrate the feasibility on larger real data. Finally, as a result of having draws from the full posterior, we show that the posterior mean of word embeddings improves over maximum a posteriori (MAP) estimates in terms of hold-out likelihood, especially for smaller sampling sizes, further strengthening the need for posterior sampling of word embeddings.
Problem

Research questions and friction points this paper is trying to address.

Quantify uncertainty in word embeddings for reliable text inference
Address computational infeasibility of existing Bayesian methods
Resolve non-identifiability issues in word embeddings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable Gibbs sampler with Polya-Gamma augmentation
Laplace approximation for uncertainty estimation
Posterior sampling improves over MAP estimates
🔎 Similar Papers
No similar papers found.
V
Väinö Yrjänäinen
Department of Statistics, Uppsala University
I
Isac Boström
Department of Mathematical Sciences, Chalmers University of Technology
Måns Magnusson
Måns Magnusson
Department of Statistics, Uppsala University, Sweden
Bayesian StatisticsProbabilistic Machine LearningText-as-DataComputational Social Science
J
Johan Jonasson
Department of Mathematical Sciences, Chalmers University of Technology