CobwebTM: Probabilistic Concept Formation for Lifelong and Hierarchical Topic Modeling

📅 2026-04-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

225K/year
🤖 AI Summary
This work addresses two key limitations in existing neural topic models—catastrophic forgetting in lifelong learning and the requirement to predefine the number of topics—as well as the inability of traditional probabilistic models to handle streaming data. To overcome these challenges, the authors propose a novel online topic modeling approach that uniquely integrates the incremental probabilistic concept formation mechanism of the Cobweb algorithm with pretrained document embeddings, adapted to a continuous semantic space. This method enables unsupervised, hierarchical topic discovery without requiring a preset topic count and supports lifelong learning. Experimental results demonstrate that the model consistently produces topics with high coherence, temporal stability, and clear hierarchical structure across multiple datasets, significantly enhancing the adaptability and scalability of topic modeling.

Technology Category

Application Category

📝 Abstract
Topic modeling seeks to uncover latent semantic structure in text corpora with minimal supervision. Neural approaches achieve strong performance but require extensive tuning and struggle with lifelong learning due to catastrophic forgetting and fixed capacity, while classical probabilistic models lack flexibility and adaptability to streaming data. We introduce \textsc{CobwebTM}, a low-parameter lifelong hierarchical topic model based on incremental probabilistic concept formation. By adapting the Cobweb algorithm to continuous document embeddings, \textsc{CobwebTM} constructs semantic hierarchies online, enabling unsupervised topic discovery, dynamic topic creation, and hierarchical organization without predefining the number of topics. Across diverse datasets, \textsc{CobwebTM} achieves strong topic coherence, stable topics over time, and high-quality hierarchies, demonstrating that incremental symbolic concept formation combined with pretrained representations is an efficient approach to topic modeling.
Problem

Research questions and friction points this paper is trying to address.

lifelong learning
topic modeling
catastrophic forgetting
streaming data
hierarchical organization
Innovation

Methods, ideas, or system contributions that make the work stand out.

lifelong learning
hierarchical topic modeling
probabilistic concept formation
incremental learning
topic coherence
🔎 Similar Papers
2024-04-02North American Chapter of the Association for Computational LinguisticsCitations: 2