Consistent Amortized Clustering via Generative Flow Networks

📅 2025-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing probabilistic clustering neural networks—e.g., Neural Clustering Processes—suffer from two key limitations: input-order dependence and inability to produce explicit assignment probabilities. To address these, we propose GFNCP, the first order-agnostic amortized clustering framework grounded in Generative Flow Networks (GFNs). GFNCP models clustering as a flow generation process over sets, unifying policy and reward via a shared energy function; we theoretically prove that its flow-matching objective is equivalent to posterior consistency of the clustering distribution, thereby guaranteeing both permutation invariance and posterior interpretability. The method integrates energy-based modeling, flow matching, and set-invariant architectures, enabling end-to-end differentiable training and single-pass inference. On synthetic and real-world benchmarks, GFNCP significantly outperforms baselines in clustering purity and normalized mutual information (NMI), while exhibiting robustness, calibrated uncertainty quantification, and computational efficiency.

Technology Category

Application Category

📝 Abstract
Neural models for amortized probabilistic clustering yield samples of cluster labels given a set-structured input, while avoiding lengthy Markov chain runs and the need for explicit data likelihoods. Existing methods which label each data point sequentially, like the Neural Clustering Process, often lead to cluster assignments highly dependent on the data order. Alternatively, methods that sequentially create full clusters, do not provide assignment probabilities. In this paper, we introduce GFNCP, a novel framework for amortized clustering. GFNCP is formulated as a Generative Flow Network with a shared energy-based parametrization of policy and reward. We show that the flow matching conditions are equivalent to consistency of the clustering posterior under marginalization, which in turn implies order invariance. GFNCP also outperforms existing methods in clustering performance on both synthetic and real-world data.
Problem

Research questions and friction points this paper is trying to address.

Amortized clustering with Generative Flow Networks
Order invariance in cluster assignments
Improved clustering performance on diverse datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative Flow Networks framework
Energy-based policy parametrization
Order-invariant clustering posterior
🔎 Similar Papers
No similar papers found.