Evaluating the Creativity of LLMs in Persian Literary Text Generation

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Existing evaluations of large language model (LLM) creativity largely neglect non-English literary traditions, particularly regarding culturally grounded rhetorical competence. Method: We construct a user-generated Persian literary dataset spanning 20 thematic domains and propose the first systematic, culture-adapted creativity assessment framework for non-English contexts—quantifying originality, fluency, flexibility, and elaboration. Drawing on the Torrance Tests of Creative Thinking, we adapt and validate an automated LLM-based scoring mechanism achieving high inter-rater reliability with human annotators (ICC > 0.85), substantially reducing evaluation cost. Results: Empirical analysis reveals LLMs’ strengths in deploying core rhetorical devices (e.g., simile, metaphor, hyperbole, antithesis), yet exposes persistent cultural expression bottlenecks. Our framework provides both empirical evidence and methodological infrastructure to guide cross-lingual literary generation model development and refinement.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated notable creative abilities in generating literary texts, including poetry and short stories. However, prior research has primarily centered on English, with limited exploration of non-English literary traditions and without standardized methods for assessing creativity. In this paper, we evaluate the capacity of LLMs to generate Persian literary text enriched with culturally relevant expressions. We build a dataset of user-generated Persian literary spanning 20 diverse topics and assess model outputs along four creativity dimensions-originality, fluency, flexibility, and elaboration-by adapting the Torrance Tests of Creative Thinking. To reduce evaluation costs, we adopt an LLM as a judge for automated scoring and validate its reliability against human judgments using intraclass correlation coefficients, observing strong agreement. In addition, we analyze the models' ability to understand and employ four core literary devices: simile, metaphor, hyperbole, and antithesis. Our results highlight both the strengths and limitations of LLMs in Persian literary text generation, underscoring the need for further refinement.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM creativity in Persian literary text generation

Developing standardized methods for assessing creativity dimensions

Analyzing LLM understanding of cultural literary devices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapted Torrance Tests for creativity dimensions

Used LLM as automated judge with human validation

Analyzed four core literary devices in Persian texts

🔎 Similar Papers

Divergent Creativity in Humans and Large Language Models