đ¤ AI Summary
Literary texts abound in sensorially rich âshowingâ descriptions, which conventional bag-of-words topic models (e.g., LDA) fail to capture effectively due to their reliance on surface-level co-occurrence statistics. To address this, we propose Retell: a lightweight, rephrasing-driven topic modeling framework that employs small-scale generative language models (e.g., Phi-3, TinyLlama) to perform controllable abstraction of narrative passagesâtransforming concrete, sensory descriptions into conceptualized representationsâbefore applying LDA for topic inference. This paradigm requires no fine-tuning or large-scale training; instead, it achieves end-to-end topic modeling solely through prompt engineering. In evaluation on race/cultural identity narratives, Retell achieves an F1-score of 0.78 against expert annotationsâoutperforming both direct LM-based extraction and standard LDA by approximately 32%âdemonstrating substantial gains in semantic fidelity and human interpretability of inferred topics.
đ Abstract
Conventional bag-of-words approaches for topic modeling, like latent Dirichlet allocation (LDA), struggle with literary text. Literature challenges lexical methods because narrative language focuses on immersive sensory details instead of abstractive description or exposition: writers are advised to"show, don't tell."We propose Retell, a simple, accessible topic modeling approach for literature. Here, we prompt resource-efficient, generative language models (LMs) to tell what passages show, thereby translating narratives' surface forms into higher-level concepts and themes. By running LDA on LMs' retellings of passages, we can obtain more precise and informative topics than by running LDA alone or by directly asking LMs to list topics. To investigate the potential of our method for cultural analytics, we compare our method's outputs to expert-guided annotations in a case study on racial/cultural identity in high school English language arts books.