๐ค AI Summary
This study addresses the limited interpretability and practical utility of traditional topic models when applied to large-scale text corpora such as social media, where keyword lists often fail to convey the core semantics of document clusters in an accurate and intuitive manner. To overcome this limitation, the authors propose a scalable framework that, for the first time, integrates retrieval-augmented generation (RAG) with chain-of-thought reasoning into topic label generation. The framework is compatible with a wide range of standard topic models and leverages multi-strategy retrieval and context-aware reasoning to transform raw keyword lists into semantically precise, human-readable narrative topic labels. Experimental evaluation on a dataset of over 6.7 million social media messages demonstrates that these generated labels are consistently rated by 16 human evaluators as significantly more interpretable and usable than those produced by conventional methods.
๐ Abstract
Topic modeling has evolved as an important means to identify evident or hidden topics within large collections of text documents. Topic modeling approaches are often used for analyzing and making sense of social media discussions consisting of millions of short text messages. However, assigning meaningful topic labels to document clusters remains challenging, as users are commonly presented with unstructured keyword lists that may not accurately capture the respective core topic. In this paper, we introduce Narrative Topic Labels derived with Retrieval Augmented Generation (NTLRAG), a scalable and extensible framework that generates semantically precise and human-interpretable narrative topic labels. Our narrative topic labels provide a context-rich, intuitive concept to describe topic model output. In particular, NTLRAG uses retrieval augmented generation (RAG) techniques and considers multiple retrieval strategies as well as chain-of-thought elements to provide high-quality output. NTLRAG can be combined with any standard topic model to generate, validate, and refine narratives which then serve as narrative topic labels. We evaluated NTLRAG with a user study and three real-world datasets consisting of more than 6.7 million social media messages that have been sent by more than 2.7 million users. The user study involved 16 human evaluators who found that our narrative topic labels offer superior interpretability and usability as compared to traditional keyword lists. An implementation of NTLRAG is publicly available for download.