Transformers can do Bayesian Clustering

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address the challenges of efficiency, accuracy, and robustness to missing data in Bayesian clustering for large-scale datasets, this paper proposes Cluster-PFN—the first method extending Prior-Data Fitted Networks (PFNs) to unsupervised Bayesian clustering. Built upon a Transformer architecture, Cluster-PFN is trained on synthetic data generated from a finite Gaussian mixture model prior, enabling end-to-end joint inference of both the optimal number of clusters and cluster assignments. Crucially, it natively supports missing-data modeling without requiring imputation or manual model selection. Experiments demonstrate that Cluster-PFN significantly outperforms classical criteria (AIC, BIC) and variational inference in clustering accuracy, achieves speedups of several orders of magnitude in inference time, and substantially surpasses state-of-the-art imputation-based baselines on high-missingness genomic data.

Technology Category

Application Category

📝 Abstract

Bayesian clustering accounts for uncertainty but is computationally demanding at scale. Furthermore, real-world datasets often contain missing values, and simple imputation ignores the associated uncertainty, resulting in suboptimal results. We present Cluster-PFN, a Transformer-based model that extends Prior-Data Fitted Networks (PFNs) to unsupervised Bayesian clustering. Trained entirely on synthetic datasets generated from a finite Gaussian Mixture Model (GMM) prior, Cluster-PFN learns to estimate the posterior distribution over both the number of clusters and the cluster assignments. Our method estimates the number of clusters more accurately than handcrafted model selection procedures such as AIC, BIC and Variational Inference (VI), and achieves clustering quality competitive with VI while being orders of magnitude faster. Cluster-PFN can be trained on complex priors that include missing data, outperforming imputation-based baselines on real-world genomic datasets, at high missingness. These results show that the Cluster-PFN can provide scalable and flexible Bayesian clustering.

Problem

Research questions and friction points this paper is trying to address.

Scaling Bayesian clustering efficiently for large datasets

Handling missing data uncertainty in clustering without imputation

Accurately estimating cluster counts and assignments using transformers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based model for Bayesian clustering

Trained on synthetic Gaussian Mixture Model data

Handles missing values without imputation uncertainty

🔎 Similar Papers

A mathematical perspective on Transformers