Tell, Don't Show: Leveraging Language Models' Abstractive Retellings to Model Literary Themes

📅 2025-05-29
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
Literary texts abound in sensorially rich “showing” descriptions, which conventional bag-of-words topic models (e.g., LDA) fail to capture effectively due to their reliance on surface-level co-occurrence statistics. To address this, we propose Retell: a lightweight, rephrasing-driven topic modeling framework that employs small-scale generative language models (e.g., Phi-3, TinyLlama) to perform controllable abstraction of narrative passages—transforming concrete, sensory descriptions into conceptualized representations—before applying LDA for topic inference. This paradigm requires no fine-tuning or large-scale training; instead, it achieves end-to-end topic modeling solely through prompt engineering. In evaluation on race/cultural identity narratives, Retell achieves an F1-score of 0.78 against expert annotations—outperforming both direct LM-based extraction and standard LDA by approximately 32%—demonstrating substantial gains in semantic fidelity and human interpretability of inferred topics.

Technology Category

Application Category

📝 Abstract
Conventional bag-of-words approaches for topic modeling, like latent Dirichlet allocation (LDA), struggle with literary text. Literature challenges lexical methods because narrative language focuses on immersive sensory details instead of abstractive description or exposition: writers are advised to"show, don't tell."We propose Retell, a simple, accessible topic modeling approach for literature. Here, we prompt resource-efficient, generative language models (LMs) to tell what passages show, thereby translating narratives' surface forms into higher-level concepts and themes. By running LDA on LMs' retellings of passages, we can obtain more precise and informative topics than by running LDA alone or by directly asking LMs to list topics. To investigate the potential of our method for cultural analytics, we compare our method's outputs to expert-guided annotations in a case study on racial/cultural identity in high school English language arts books.
Problem

Research questions and friction points this paper is trying to address.

Traditional topic modeling struggles with literary texts
Proposes using language models to retell passages for themes
Compares method's results to expert annotations on identity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses generative language models for retelling
Applies LDA on LM retellings for topics
Translates narratives into higher-level themes
🔎 Similar Papers
No similar papers found.
L
Li Lucy
University of California Berkeley
C
Camilla Griffiths
Stanford University
S
Sarah Levine
Stanford University
J
Jennifer L. Eberhardt
Stanford University
Dorottya Demszky
Dorottya Demszky
Assistant Professor, Stanford University
natural language processingeducation data scienceteacher professional learning
David Bamman
David Bamman
UC Berkeley
Natural Language ProcessingMachine LearningDigital HumanitiesComputational Social Science