Within-Model vs Between-Prompt Variability in Large Language Models for Creative Tasks

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the need to quantify the sources of variation in large language models’ outputs on creative tasks, disentangling the contributions of prompts, model choice, and sampling randomness. By generating 100 samples per prompt across 12 models for 10 creative prompts (yielding 12,000 total outputs), the authors employ variance decomposition to systematically analyze the components of variance in originality and fluency. Their analysis reveals, for the first time, that prompt selection accounts for 36.43% of the variance in originality—nearly matching the 40.94% attributable to model choice. In contrast, fluency is predominantly driven by model choice (51.25%) and within-model stochasticity (33.70%), with prompts contributing only 4.22%. These findings underscore the susceptibility of single-sample evaluations to sampling noise and highlight the necessity of multi-sample generation and controlled experimental designs in assessing creative language generation.

Technology Category

Application Category

📝 Abstract
How much of LLM output variance is explained by prompts versus model choice versus stochasticity through sampling? We answer this by evaluating 12 LLMs on 10 creativity prompts with 100 samples each (N = 12,000). For output quality (originality), prompts explain 36.43% of variance, comparable to model choice (40.94%). But for output quantity (fluency), model choice (51.25%) and within-LLM variance (33.70%) dominate, with prompts explaining only 4.22%. Prompts are powerful levers for steering output quality, but given the substantial within-LLM variance (10-34%), single-sample evaluations risk conflating sampling noise with genuine prompt or model effects.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Prompt Variability
Model Variability
Creativity Tasks
Output Variance
Innovation

Methods, ideas, or system contributions that make the work stand out.

within-model variability
between-prompt variability
large language models
creativity evaluation
output variance
🔎 Similar Papers
No similar papers found.