Interpretable Topic Extraction and Word Embedding Learning using row-stochastic DEDICOM

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

To address the interpretability requirements in text topic modeling and word representation learning, this paper proposes a joint modeling framework based on row-stochastic constrained DEDICOM. First, a pointwise mutual information (PMI) matrix is constructed from word co-occurrences. Then, a row-stochastic constraint is imposed on the DEDICOM decomposition to enforce that each word predominantly contributes to exactly one latent topic, thereby ensuring semantically clear and interpretable topic assignments. The framework jointly learns topic structure and word embeddings within a unified optimization objective, simultaneously producing interpretable topic clusters and high-quality, semantically aligned word vectors. Experimental results demonstrate that the proposed method outperforms mainstream baselines in topic coherence, word embedding analogy reasoning, and human evaluation—validating its dual advantages in both interpretability and representation quality.

Technology Category

Application Category

📝 Abstract

The DEDICOM algorithm provides a uniquely interpretable matrix factorization method for symmetric and asymmetric square matrices. We employ a new row-stochastic variation of DEDICOM on the pointwise mutual information matrices of text corpora to identify latent topic clusters within the vocabulary and simultaneously learn interpretable word embeddings. We introduce a method to efficiently train a constrained DEDICOM algorithm and a qualitative evaluation of its topic modeling and word embedding performance.

Problem

Research questions and friction points this paper is trying to address.

Extract interpretable topics from text corpora

Learn meaningful word embeddings simultaneously

Train constrained DEDICOM efficiently for analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Row-stochastic DEDICOM for topic extraction

Interpretable word embeddings via PMI matrices

Efficient constrained DEDICOM training method

🔎 Similar Papers

S3 - Semantic Signal Separation