π€ AI Summary
This study addresses the overlooked issue of diversity collapse in creative AI systems, where excessive idea convergence at the population level undermines collective creativity despite strong individual performance. Treating creativity as a congestible resource, the work proposes a novel, interaction-free evaluation framework that compares model-generated outputs against human baselines through distributional analysis. It introduces two key metricsβthe excess congestion coefficient Ξ and the human-relative diversity ratio Οβand establishes a theoretical link to adoption games, enabling proactive intervention during development. By integrating distributional comparison, congestion kernel modeling, and generative protocol design, the framework reveals that three state-of-the-art large language models fail to achieve diversity equilibrium across short-story generation, marketing slogans, and alternative uses tasks. Results demonstrate that optimizing generative protocols effectively mitigates congestion, with stable assessments achievable under practical sample sizes.
π Abstract
Creative AI systems are typically evaluated at the level of individual utility, yet creative outputs are consumed in populations: an idea loses value when many others produce similar ones. This creates an evaluation blind spot, as AI can improve individual outputs while increasing population-level crowding. We introduce a human-relative framework for benchmarking AI-induced human diversity collapse without requiring human-AI interaction data, providing an ex ante protocol to estimate crowding risk from model-only generations and matched unaided human baselines. By modeling ideas as congestible resources, we show that source-level crowding is identifiable from within-distribution comparisons, yielding an excess-crowding coefficient $Ξ$ and a human-relative diversity ratio $Ο$. We show that $Ο\ge1$ is the no-excess-crowding parity condition and connect $Ξ$ to an adoption game with exposure-dependent redundancy costs. Across short stories, marketing slogans, and alternative-uses tasks, three frontier LLMs fall below parity across crowding kernels. Estimates stabilize with feasible model-only sample sizes. Importantly, generation-protocol variants show that crowding can be reduced through targeted design, making diversity collapse an actionable, development-time evaluation target for population-aware creative AI.