What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models

📅 2025-10-04

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Existing AI creativity evaluation frameworks are fragmented and lack robust theoretical grounding. Method: We propose C²-Eval—the first fine-grained, unified benchmark grounded in social science theory—establishing a dual-path taxonomy distinguishing convergent and divergent creativity, and operationalizing three core dimensions: Usefulness, Originality, and Surprise. It integrates human and automated evaluation methods. Contribution/Results: C²-Eval enables the first systematic, multi-task, cross-domain creativity assessment of both leading closed- and open-weight large language models. Experiments reveal significant trade-offs and structural limitations across the three dimensions in current foundation models. The framework accurately characterizes the developmental status and evolutionary trajectory of generative AI creativity, offering a novel paradigm for theoretical modeling and capability evaluation of machine creativity.

Technology Category

Application Category

📝 Abstract

The meteoric rise of foundation models (FMs) has expanded their capabilities far beyond conventional tasks. Creativity, long regarded as a hallmark of human intelligence and a driver of innovation, is now increasingly recognized as a critical dimension of machine intelligence in the era of generative FMs, complementing traditional measures of accuracy. However, existing evaluation frameworks for creativity remain fragmented, relying on ad hoc metrics not firmly grounded in established theories. To address this gap, we introduce C^2-Eval, a holistic benchmark for unified assessment of creativity in FMs. C^2-Eval distinguishes between two complementary forms of creativity: convergent creativity, where tasks admit constrained solutions (e.g., code generation), and divergent creativity, where tasks are open-ended (e.g., storytelling). It evaluates both dimensions using fine-grained criteria derived from social-science theory, focusing on Usefulness, Originality, and Surprise (U-O-S). Through extensive experiments on leading proprietary and open-source models, we analyze trade-offs in their creative capabilities. Our results highlight both the strengths and challenges of current FMs in pursuing a creative machine mind, showing that C^2-Eval is an effective lens for examining the evolving landscape of creative AI.

Problem

Research questions and friction points this paper is trying to address.

Evaluating fragmented creativity metrics in foundation models

Benchmarking both convergent and divergent creative capabilities

Assessing creativity using usefulness, originality, and surprise criteria

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces C^2-Eval benchmark for creativity assessment

Evaluates convergent and divergent creativity dimensions

Uses Usefulness, Originality, Surprise criteria from theory

🔎 Similar Papers

Divergent Creativity in Humans and Large Language Models