🤖 AI Summary
Modeling multivariate count data with unknown latent feature dimensionality and heterogeneous yet non-independent group structures remains challenging.
Method: This paper proposes a Bayesian nonparametric framework based on completely random measures (CRMs), relaxing the standard full exchangeability assumption in favor of partial exchangeability. It treats the number of latent features as a random variable and employs a novel mixture model to adaptively discover group structure, thereby avoiding over-clustering induced by fixed-dimensional assumptions.
Contribution/Results: Theoretically, closed-form expressions for marginal and posterior distributions are derived, accommodating both binary and Poisson observations. Algorithmically, the method ensures interpretability and computational feasibility. Empirically, applied to the “Ndrangheta crime network,” it successfully uncovers hidden organizational subgroups and functional divisions, demonstrating robust modeling capacity and inferential validity for complex, heterogeneous count data.
📝 Abstract
Feature and trait allocation models are fundamental objects in Bayesian nonparametrics and play a prominent role in several applications. Existing approaches, however, typically assume full exchangeability of the data, which may be restrictive in settings characterized by heterogeneous but related groups. In this paper, we introduce a general and tractable class of Bayesian nonparametric priors for partially exchangeable trait allocation models, relying on completely random vectors. We provide a comprehensive theoretical analysis, including closed-form expressions for marginal and posterior distributions, and illustrate the tractability of our framework in the cases of binary and Poisson-distributed traits. A distinctive aspect of our approach is that the number of traits is a random quantity, thereby allowing us to model and estimate unobserved traits. Building on these results, we also develop a novel mixture model that infers the group partition structure from the data, effectively clustering trait allocations. This extension generalizes Bayesian nonparametric latent class models and avoids the systematic overclustering that arises when the number of traits is assumed to be fixed. We demonstrate the practical usefulness of our methodology through an application to the `Ndrangheta criminal network from the Operazione Infinito investigation, where our model provides insights into the organization of illicit activities.