CREATE: Testing LLMs for Associative Creativity

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work proposes CREATE, a benchmark for evaluating large language models’ associative creativity—the ability to forge novel and meaningful connections between concepts. The framework enables the first scalable and objectively scoreable assessment of associative creativity by prompting models to generate multiple high-specificity, diverse conceptual linkage paths from their parametric knowledge, which are then quantitatively evaluated via an automated scoring mechanism. Experimental results indicate that while state-of-the-art models demonstrate some creative utility, they fall short of saturating the benchmark; strategies such as chain-of-thought reasoning yield only marginal improvements, and higher computational costs do not necessarily translate into performance gains. This study establishes a new paradigm for the systematic evaluation of creative capabilities in artificial intelligence systems.

Technology Category

Application Category

📝 Abstract

A key component of creativity is associative reasoning: the ability to draw novel yet meaningful connections between concepts. We introduce CREATE, a benchmark designed to evaluate models' capacity for creative associative reasoning. CREATE requires models to generate sets of paths connecting concepts in a model's parametric knowledge. Paths should have high specificity (distinctiveness and closeness of the concept connection) and high diversity (dissimilarity from other paths), and models are scored more highly if they produce a larger set of strong, diverse paths. This task shares demands of real creativity tasks like hypothesis generation, including an extremely large search space, but enables collection of a sizable benchmark with objective answer grading. Evaluation of frontier models shows that the strongest models achieve higher creative utility than others, with the high multiplicity of answers and complexity of the search making benchmark saturation difficult to achieve. Furthermore, our results illustrate that thinking models are not always more effective on our task, even with high token budgets. Recent approaches for creative prompting give some but limited additional improvement. CREATE provides a sandbox for developing new methods to improve models' capacity for associative creativity.

Problem

Research questions and friction points this paper is trying to address.

associative creativity

creative reasoning

concept association

large language models

creativity evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

associative creativity

benchmark

conceptual paths