Rethinking Creativity Evaluation: A Critical Analysis of Existing Creativity Evaluations

📅 2025-08-07

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Existing creativity evaluation methods—including creativity indices, perplexity, syntactic templates, and LLM-as-a-Judge—exhibit inconsistent performance across domains such as creative writing, unconventional problem solving, and research ideation. Each captures only a partial aspect of creativity: lexical diversity metrics neglect conceptual novelty; perplexity is confounded by model confidence; and LLM-based judgments suffer from instability and subjective bias. Through cross-domain empirical comparative analysis, this study systematically reveals a significant misalignment between prevailing automated metrics and human creativity assessments. The core contribution is a foundational critique of current evaluation paradigms, demonstrating both the necessity and feasibility of developing a more robust, generalizable, and human-aligned evaluation framework. This work provides theoretical grounding and methodological guidance for future quantitative studies of creativity.

Technology Category

Application Category

📝 Abstract

We systematically examine, analyze, and compare representative creativity measures--creativity index, perplexity, syntactic templates, and LLM-as-a-Judge--across diverse creative domains, including creative writing, unconventional problem-solving, and research ideation. Our analyses reveal that these metrics exhibit limited consistency, capturing different dimensions of creativity. We highlight key limitations, including the creativity index's focus on lexical diversity, perplexity's sensitivity to model confidence, and syntactic templates' inability to capture conceptual creativity. Additionally, LLM-as-a-Judge shows instability and bias. Our findings underscore the need for more robust, generalizable evaluation frameworks that better align with human judgments of creativity.

Problem

Research questions and friction points this paper is trying to address.

Evaluating creativity metrics' consistency across diverse domains

Identifying limitations in current creativity assessment methods

Developing robust frameworks aligning with human creativity judgments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic comparison of diverse creativity metrics

Critical analysis of limitations in current evaluations

Proposal for robust generalizable creativity frameworks

🔎 Similar Papers

Creativity and Machine Learning: A Survey