🤖 AI Summary
This work addresses the limitation of traditional Latent Dirichlet Allocation (LDA) in capturing correlations and hierarchical structures among topics due to its use of a symmetric Dirichlet prior. To overcome this, we propose Latent Dirichlet-Tree Allocation (LDTA), which introduces a Dirichlet-Tree prior into the topic modeling framework for the first time. LDTA preserves LDA’s generative structure while explicitly modeling tree-structured dependencies among topic proportions. We derive corresponding mean-field variational inference and expectation propagation algorithms, uncover their vectorized nature, and implement fully vectorized, GPU-accelerated computation. This approach substantially enhances representational capacity over LDA while maintaining scalability, enabling efficient and scalable Bayesian inference.
📝 Abstract
Latent Dirichlet Allocation (LDA) is a foundational model for discovering latent thematic structure in discrete data, but its Dirichlet prior cannot represent the rich correlations and hierarchical relationships often present among topics. We introduce the framework of Latent Dirichlet-Tree Allocation (LDTA), a generalization of LDA that replaces the Dirichlet prior with an arbitrary Dirichlet-Tree (DT) distribution. LDTA preserves LDA's generative structure but enables expressive, tree-structured priors over topic proportions. To perform inference, we develop universal mean-field variational inference and Expectation Propagation, providing tractable updates for all DT. We reveal the vectorized nature of the two inference methods through theoretical development, and perform fully vectorized, GPU-accelerated implementations. The resulting framework substantially expands the modeling capacity of LDA while maintaining scalability and computational efficiency.