π€ AI Summary
This work addresses the limitations of traditional dynamic topic models, which struggle to capture high-order word interactions and conflate word occurrence with repetition under a single mechanism, thereby hindering the modeling of overlapping semantics. To overcome this, the authors propose a hypergraph-based text representation: documents are modeled as hyperedges connecting co-occurring words, with node weights encoding term frequencies, thus explicitly decoupling occurrence from repetition. Building on this, they introduce a novel hypergraph multinomial distribution with nonlinear normalization, integrated with structured low-rank decomposition and temporal regularization for dynamic topic modeling. This is the first approach to leverage hypergraph structures to disentangle frequency mechanisms, accompanied by theoretical guarantees of local convergence and non-asymptotic error bounds under non-convex optimization. Experiments on synthetic data and ICLR paper corpora demonstrate significant improvements over state-of-the-art models, confirming the methodβs effectiveness and robustness.
π Abstract
Dynamic topic modeling is widely used to analyze evolving trends in scientific literature, medical records, and social media. Traditional topic models represent each topic through a single probability vector on the multinomial simplex and implicitly couple word occurrence and repetition within one probabilistic mechanism. However, this formulation restricts the dependence structure among words and overlooks informative higher-order interactions, particularly in dynamic corpora with overlapping semantics. To address these limitations, we introduce a hypergraph representation of text where each document is modeled as a hyperedge connecting all co-occurring words, with repetition intensities encoded as node weights. This representation naturally separates word occurrence from repetition and induces a novel hypergraph-based multinomial distribution with a nonlinear normalization depending on the observed word set of each document. Building on this likelihood, we develop a dynamic topic modeling framework via structured low-rank factorizations with explicit temporal regularization on topic-word profiles. Moreover, we establish local convergence guarantees and derive non-asymptotic error bounds despite the intrinsic nonconvexity induced by bilinear factorization and document-specific nonlinear normalization. Numerical experiments on synthetic data and an application to the International Conference on Learning Representations (ICLR) corpus demonstrate consistent improvements over existing multinomial-based topic models.