Beyond Divergent Creativity: A Human-Based Evaluation of Creativity in Large Language Models

📅 2026-01-28

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This study addresses a critical limitation in current large language model (LLM) creativity evaluation methods—such as the Divergent Association Task (DAT)—which lack grounding in human creativity theory and overemphasize novelty while neglecting appropriateness, yielding poorly interpretable results. Drawing on psychological theories of creativity, this work proposes the Conditional Divergent Association Task (CDAT), which, for the first time, integrates appropriateness as a constraint within novelty assessment while preserving simplicity and objectivity, thereby effectively distinguishing random noise from genuine creativity. Through comparative experiments establishing a new benchmark, the authors find that smaller model families exhibit higher creativity on CDAT, whereas state-of-the-art models, shaped by alignment training, prioritize appropriateness at the expense of novelty. The dataset and code are publicly released alongside this work.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly used in verbal creative tasks. However, previous assessments of the creative capabilities of LLMs remain weakly grounded in human creativity theory and are thus hard to interpret. The widely used Divergent Association Task (DAT) focuses on novelty, ignoring appropriateness, a core component of creativity. We evaluate a range of state-of-the-art LLMs on DAT and show that their scores on the task are lower than those of two baselines that do not possess any creative abilities, undermining its validity for model evaluation. Grounded in human creativity theory, which defines creativity as the combination of novelty and appropriateness, we introduce Conditional Divergent Association Task (CDAT). CDAT evaluates novelty conditional on contextual appropriateness, separating noise from creativity better than DAT, while remaining simple and objective. Under CDAT, smaller model families often show the most creativity, whereas advanced families favor appropriateness at lower novelty. We hypothesize that training and alignment likely shift models along this frontier, making outputs more appropriate but less creative. We release the dataset and code.

Problem

Research questions and friction points this paper is trying to address.

creativity evaluation

large language models

divergent association task

novelty

appropriateness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditional Divergent Association Task

creativity evaluation

large language models