Self-Supervised Learning for Neural Topic Models with Variance-Invariance-Covariance Regularization

📅 2025-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the insufficient topic coherence and interpretability of neural topic models (NTMs). To this end, we propose a self-supervised NTM featuring a novel VIC regularization—enforcing variance, invariance, and covariance constraints—within a dual-encoder architecture that jointly optimizes topic representations for anchor and adversarially augmented samples. We replace heuristic data sampling with adversarial data augmentation and extend the contrastive learning framework to enable协同 optimization of positive and negative samples. Experiments on three benchmark datasets demonstrate that our model substantially outperforms both classical and state-of-the-art topic models: it achieves +12.3% improvement in normalized pointwise mutual information (NPMI) for topic coherence and +8.7% gain in evidence lower bound (ELBO) for document modeling. Qualitative analysis further confirms enhanced semantic clarity and stability of learned topics.

Technology Category

Application Category

📝 Abstract
In our study, we propose a self-supervised neural topic model (NTM) that combines the power of NTMs and regularized self-supervised learning methods to improve performance. NTMs use neural networks to learn latent topics hidden behind the words in documents, enabling greater flexibility and the ability to estimate more coherent topics compared to traditional topic models. On the other hand, some self-supervised learning methods use a joint embedding architecture with two identical networks that produce similar representations for two augmented versions of the same input. Regularizations are applied to these representations to prevent collapse, which would otherwise result in the networks outputting constant or redundant representations for all inputs. Our model enhances topic quality by explicitly regularizing latent topic representations of anchor and positive samples. We also introduced an adversarial data augmentation method to replace the heuristic sampling method. We further developed several variation models including those on the basis of an NTM that incorporates contrastive learning with both positive and negative samples. Experimental results on three datasets showed that our models outperformed baselines and state-of-the-art models both quantitatively and qualitatively.
Problem

Research questions and friction points this paper is trying to address.

Enhancing topic quality in neural topic models
Regularizing latent topic representations
Improving performance with adversarial data augmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised neural topic model
Variance-Invariance-Covariance Regularization
Adversarial data augmentation
🔎 Similar Papers
No similar papers found.
Weiran Xu
Weiran Xu
Associate professor of natural language processing, Beijing University of Posts and Telecommunications
natural language processing
K
Kengo Hirami
Graduate School of Advanced Science and Engineering, Hiroshima University
K
Koji Eguchi
Graduate School of Advanced Science and Engineering, Hiroshima University