🤖 AI Summary
This study systematically investigates differences between large language models (LLMs) and humans in divergent creativity, with a focus on semantic diversity as a core dimension. Method: We propose the first comparable, reproducible cross-subject quantitative evaluation framework for divergent creativity, integrating cognitive psychology scales, computational semantic similarity metrics (BERTScore and Word2Vec), and diversity measures (unigram entropy and uniqueness). The framework is benchmarked on 100,000 real human behavioral responses and state-of-the-art LLMs. Contribution/Results: Results show that certain LLMs significantly outperform the human population average on divergent association and creative writing tasks—and approach the performance of highly creative individuals. The framework is publicly released, establishing a new empirical paradigm for measurable advancement of creative AI and for foundational research into the nature of human originality.
📝 Abstract
The recent surge in the capabilities of Large Language Models (LLMs) has led to claims that they are approaching a level of creativity akin to human capabilities. This idea has sparked a blend of excitement and apprehension. However, a critical piece that has been missing in this discourse is a systematic evaluation of LLM creativity, particularly in comparison to human divergent thinking. To bridge this gap, we leverage recent advances in creativity science to build a framework for in-depth analysis of divergent creativity in both state-of-the-art LLMs and a substantial dataset of 100,000 humans. We found evidence suggesting that LLMs can indeed surpass human capabilities in specific creative tasks such as divergent association and creative writing. Our quantitative benchmarking framework opens up new paths for the development of more creative LLMs, but it also encourages more granular inquiries into the distinctive elements that constitute human inventive thought processes, compared to those that can be artificially generated.