Interpretable Topic Extraction and Word Embedding Learning using row-stochastic DEDICOM

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the interpretability requirements in text topic modeling and word representation learning, this paper proposes a joint modeling framework based on row-stochastic constrained DEDICOM. First, a pointwise mutual information (PMI) matrix is constructed from word co-occurrences. Then, a row-stochastic constraint is imposed on the DEDICOM decomposition to enforce that each word predominantly contributes to exactly one latent topic, thereby ensuring semantically clear and interpretable topic assignments. The framework jointly learns topic structure and word embeddings within a unified optimization objective, simultaneously producing interpretable topic clusters and high-quality, semantically aligned word vectors. Experimental results demonstrate that the proposed method outperforms mainstream baselines in topic coherence, word embedding analogy reasoning, and human evaluation—validating its dual advantages in both interpretability and representation quality.

Technology Category

Application Category

📝 Abstract
The DEDICOM algorithm provides a uniquely interpretable matrix factorization method for symmetric and asymmetric square matrices. We employ a new row-stochastic variation of DEDICOM on the pointwise mutual information matrices of text corpora to identify latent topic clusters within the vocabulary and simultaneously learn interpretable word embeddings. We introduce a method to efficiently train a constrained DEDICOM algorithm and a qualitative evaluation of its topic modeling and word embedding performance.
Problem

Research questions and friction points this paper is trying to address.

Extract interpretable topics from text corpora
Learn meaningful word embeddings simultaneously
Train constrained DEDICOM efficiently for analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Row-stochastic DEDICOM for topic extraction
Interpretable word embeddings via PMI matrices
Efficient constrained DEDICOM training method
🔎 Similar Papers
No similar papers found.