🤖 AI Summary
This paper challenges the necessity of neural topic models (NTMs), arguing that their architectural complexity and training requirements are not essential for effective topic modeling. Method: It reframes topic modeling as a long-text generation task, leveraging large language models (LLMs) with extended context windows to generate topic descriptions and representative document snippets via zero-shot prompting—eliminating the need for model training. The approach integrates techniques including data subset sampling and keyword-matching–driven text categorization to enhance coherence and coverage. Contribution/Results: Experiments demonstrate that this paradigm matches or surpasses state-of-the-art NTMs in topic coherence, interpretability, and document coverage. Crucially, it is the first work to formally cast topic modeling as a generation task natively supported by LLMs, empirically refuting the prevailing consensus that NTMs remain indispensable. This advances topic modeling toward lightweight, generalizable, zero-shot paradigms.
📝 Abstract
Traditional topic models such as neural topic models rely on inference and generation networks to learn latent topic distributions. This paper explores a new paradigm for topic modeling in the era of large language models, framing TM as a long-form generation task whose definition is updated in this paradigm. We propose a simple but practical approach to implement LLM-based topic model tasks out of the box (sample a data subset, generate topics and representative text with our prompt, text assignment with keyword match). We then investigate whether the long-form generation paradigm can beat NTMs via zero-shot prompting. We conduct a systematic comparison between NTMs and LLMs in terms of topic quality and empirically examine the claim that "a majority of NTMs are outdated."