🤖 AI Summary
This study challenges the prevailing assumption that AI possesses genuine creative cognition. We systematically compare GPT-3.5-turbo, GPT-4, and GPT-4o against human participants across core creative cognitive processes—including divergent/convergent thinking, insight problem solving, associative fluency, representational transformation, and creative evaluation. Using standardized cognitive psychology paradigms (e.g., Remote Associates Test, Nine-Dot Problem, free/chained association, representational change tasks), augmented by computational creativity scoring models and decision error analysis, we find that while LLMs outperform humans in ideational quantity, they exhibit profound mechanistic deficits: a 37% reduction in forward associative flow, 42% lower representational transformation accuracy, 2.1× higher creative selection error rates, and absence of dynamic trade-off strategies between novelty and appropriateness. Critically, this work provides the first empirical evidence that AI’s apparent creative advantage stems from statistical pattern matching—not authentic creative cognition—thereby establishing a novel cognitive framework for evaluating AI creativity.
📝 Abstract
A key objective in artificial intelligence (AI) development is to create systems that match or surpass human creativity. Although current AI models perform well across diverse creative tasks, it remains unclear whether these achievements reflect genuine creative thinking. This study examined whether AI models (GPT-3.5-turbo, GPT-4, and GPT-4o) engage in creative thinking by comparing their performance with humans across various creative tasks and core cognitive processes. Results showed that AI models outperformed humans in divergent thinking, convergent thinking, and insight problem-solving, but underperformed in creative writing. Compared to humans, AI generated lower forward flow values in both free and chain association tasks and showed lower accuracy in the representational change task. In creative evaluation, AI exhibited no significant correlation between the weights of novelty and appropriateness when predicting creative ratings, suggesting the absence of a human-like trade-off strategy. AI also had higher decision error scores in creative selection, suggesting difficulty identifying the most creative ideas. These findings suggest that while AI can mimic human creativity, its strong performance in creative tasks is likely driven by non-creative mechanisms rather than genuine creative thinking.