Societal Impacts Research Requires Benchmarks for Creative Composition Tasks

📅 2025-04-09

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Current AI evaluation benchmarks critically neglect creative writing tasks, exacerbating societal risks—including content homogenization and proliferation of misleading information. Method: We propose a socially impact–oriented benchmark construction paradigm, grounded in 2 million real-world user prompt logs, integrated with large-scale topic modeling (LDA and LLM-assisted analysis) and a benchmark gap diagnostic framework. Contribution/Results: Our analysis reveals that creative synthesis requests constitute over 35% of real-world usage, yet mainstream benchmarks cover fewer than 12% of such tasks—identifying this domain as a high-frequency, high-risk evaluation blind spot. Empirically, we demonstrate a structural misalignment between existing evaluation frameworks and authentic user needs. Beyond diagnosis, we advance a new evaluation consensus that explicitly integrates creativity, output diversity, and societal impact—establishing foundational principles for socially responsible AI assessment.

Technology Category

Application Category

📝 Abstract

Foundation models that are capable of automating cognitive tasks represent a pivotal technological shift, yet their societal implications remain unclear. These systems promise exciting advances, yet they also risk flooding our information ecosystem with formulaic, homogeneous, and potentially misleading synthetic content. Developing benchmarks grounded in real use cases where these risks are most significant is therefore critical. Through a thematic analysis using 2 million language model user prompts, we identify creative composition tasks as a prevalent usage category where users seek help with personal tasks that require everyday creativity. Our fine-grained analysis identifies mismatches between current benchmarks and usage patterns among these tasks. Crucially, we argue that the same use cases that currently lack thorough evaluations can lead to negative downstream impacts. This position paper argues that benchmarks focused on creative composition tasks is a necessary step towards understanding the societal harms of AI-generated content. We call for greater transparency in usage patterns to inform the development of new benchmarks that can effectively measure both the progress and the impacts of models with creative capabilities.

Problem

Research questions and friction points this paper is trying to address.

Assessing societal impacts of AI in creative composition tasks

Identifying mismatches between benchmarks and real usage patterns

Developing transparent benchmarks to measure AI's creative capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzed 2M prompts for creative composition tasks

Identified gaps in current benchmark evaluations

Proposed transparent benchmarks for societal impact

🔎 Similar Papers

Divergent Creativity in Humans and Large Language Models