Large language models for folktale type automation based on motifs: Cinderella case study

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of large-scale, cross-lingual variant analysis in folk narrative typology. Methodologically, it proposes an automated motif identification and comparative framework leveraging large language models (LLMs) integrated with NLP techniques to perform fine-grained motif extraction from multilingual *Cinderella* texts; structural similarity modeling and visualization across hundreds of variants are achieved via clustering and dimensionality reduction. Its key contributions are twofold: (1) the first deep integration of LLMs into computational folklore analysis, overcoming bottlenecks of manual coding; and (2) construction of a cross-lingually aligned motif vector space, markedly improving thematic consistency detection and motif variation pattern recognition. Experimental evaluation demonstrates the framework’s effectiveness in motif identification accuracy, cross-lingual comparability, and interpretability of cultural divergence—establishing a scalable, digital-humanities–driven methodological paradigm for folk narrative studies.

Technology Category

Application Category

📝 Abstract
Artificial intelligence approaches are being adapted to many research areas, including digital humanities. We built a methodology for large-scale analyses in folkloristics. Using machine learning and natural language processing, we automatically detected motifs in a large collection of Cinderella variants and analysed their similarities and differences with clustering and dimensionality reduction. The results show that large language models detect complex interactions in tales, enabling computational analysis of extensive text collections and facilitating cross-lingual comparisons.
Problem

Research questions and friction points this paper is trying to address.

Automating folktale type classification using motif detection
Analyzing similarities in Cinderella variants through computational methods
Enabling cross-lingual comparisons of folktales with large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using large language models for motif detection
Applying clustering to analyze tale similarities
Employing dimensionality reduction for cross-lingual comparisons
T
Tjaša Arčon
University of Ljubljana, Faculty of Computer and Information Science, Slovenia
Marko Robnik-Šikonja
Marko Robnik-Šikonja
Professor of Computer Science, University of Ljubljana, Head of ML & LT Lab
Machine LearningArtificial IntelligenceNatural Language ProcessingExplainable AI
P
Polona Tratnik
University of Ljubljana, Faculty of Computer and Information Science; University of Ljubljana, Faculty of Arts, Slovenia; Institute IRRIS, Slovenia