NTLRAG: Narrative Topic Labels derived with Retrieval Augmented Generation

๐Ÿ“… 2026-02-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses the limited interpretability and practical utility of traditional topic models when applied to large-scale text corpora such as social media, where keyword lists often fail to convey the core semantics of document clusters in an accurate and intuitive manner. To overcome this limitation, the authors propose a scalable framework that, for the first time, integrates retrieval-augmented generation (RAG) with chain-of-thought reasoning into topic label generation. The framework is compatible with a wide range of standard topic models and leverages multi-strategy retrieval and context-aware reasoning to transform raw keyword lists into semantically precise, human-readable narrative topic labels. Experimental evaluation on a dataset of over 6.7 million social media messages demonstrates that these generated labels are consistently rated by 16 human evaluators as significantly more interpretable and usable than those produced by conventional methods.

Technology Category

Application Category

๐Ÿ“ Abstract
Topic modeling has evolved as an important means to identify evident or hidden topics within large collections of text documents. Topic modeling approaches are often used for analyzing and making sense of social media discussions consisting of millions of short text messages. However, assigning meaningful topic labels to document clusters remains challenging, as users are commonly presented with unstructured keyword lists that may not accurately capture the respective core topic. In this paper, we introduce Narrative Topic Labels derived with Retrieval Augmented Generation (NTLRAG), a scalable and extensible framework that generates semantically precise and human-interpretable narrative topic labels. Our narrative topic labels provide a context-rich, intuitive concept to describe topic model output. In particular, NTLRAG uses retrieval augmented generation (RAG) techniques and considers multiple retrieval strategies as well as chain-of-thought elements to provide high-quality output. NTLRAG can be combined with any standard topic model to generate, validate, and refine narratives which then serve as narrative topic labels. We evaluated NTLRAG with a user study and three real-world datasets consisting of more than 6.7 million social media messages that have been sent by more than 2.7 million users. The user study involved 16 human evaluators who found that our narrative topic labels offer superior interpretability and usability as compared to traditional keyword lists. An implementation of NTLRAG is publicly available for download.
Problem

Research questions and friction points this paper is trying to address.

topic modeling
topic labeling
interpretability
social media analysis
narrative generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval Augmented Generation
Narrative Topic Labels
Topic Modeling
Chain-of-Thought
Human-Interpretable AI
๐Ÿ”Ž Similar Papers
No similar papers found.
L
Lisa Grobelscheg
Institute for Complex Networks, Vienna University of Economic s and Business (WU Vienna), Welthandelsplatz 1, Vienna, 1020, Austria; CAMPUS 02, University of Applied Sciences, Kรถrblergasse 126, 8010, Graz, Austria
Ema Kahr
Ema Kahr
Assistant Professor, Vienna University of Economics and Business (WU)
AIComplex NetworksComputational Social ScienceData SciencePsycholinguistics
Mark Strembeck
Mark Strembeck
WU Vienna
Complex SystemsComputational Social ScienceCyber SecurityData ScienceSoftware Engineering