🤖 AI Summary
To address the interpretability requirements in text topic modeling and word representation learning, this paper proposes a joint modeling framework based on row-stochastic constrained DEDICOM. First, a pointwise mutual information (PMI) matrix is constructed from word co-occurrences. Then, a row-stochastic constraint is imposed on the DEDICOM decomposition to enforce that each word predominantly contributes to exactly one latent topic, thereby ensuring semantically clear and interpretable topic assignments. The framework jointly learns topic structure and word embeddings within a unified optimization objective, simultaneously producing interpretable topic clusters and high-quality, semantically aligned word vectors. Experimental results demonstrate that the proposed method outperforms mainstream baselines in topic coherence, word embedding analogy reasoning, and human evaluation—validating its dual advantages in both interpretability and representation quality.
📝 Abstract
The DEDICOM algorithm provides a uniquely interpretable matrix factorization method for symmetric and asymmetric square matrices. We employ a new row-stochastic variation of DEDICOM on the pointwise mutual information matrices of text corpora to identify latent topic clusters within the vocabulary and simultaneously learn interpretable word embeddings. We introduce a method to efficiently train a constrained DEDICOM algorithm and a qualitative evaluation of its topic modeling and word embedding performance.