Playing with Words, Improving with Rewards: Training Language Models for Creative Association

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the lack of objective evaluation criteria for creativity in large language models by proposing a reinforcement learning framework with verifiable rewards (RLVR). Leveraging the word-association game Codenames, RLVR formulates a quantifiable and verifiable creativity training task. Applied to the Qwen3 model series (1.7B/4B/8B), this approach enables, for the first time, large-scale creativity training without human evaluation. Experimental results demonstrate that the 8B model achieves significant improvements on 8 out of 10 established creativity benchmarks, while smaller models also exhibit markedly enhanced reasoning capabilities. These findings highlight the critical role of model scale in mediating the trade-off between creativity and precision.

📝 Abstract

Large Language Models (LLMs) are being applied to increasingly difficult problems and use cases. To navigate their vast solution spaces effectively, LLMs need to be creative. Yet the subjective nature of creativity and the limits of human judgment make training LLMs for creativity especially challenging. As a solution, we train LLMs on Codenames, a word-association game that exercises the two central axes of creativity, divergent and convergent thinking, while yielding objectively verifiable outcomes. This verifiability lets us bypass human judgment and train with Reinforcement Learning with Verifiable Rewards (RLVR). We train Qwen3-1.7B, 4B, and 8B models and evaluate them on ten creativity and four reasoning benchmarks. We find that the precision-diversity trade-off is scale-dependent: the 8B model prioritizes creativity over precision, while the 1.7B and 4B models gain reasoning precision at the cost of creativity. Concretely, the 8B model shows modest but consistent creativity gains (8 of 10 benchmarks) with only minor reasoning degradation, whereas the smaller models achieve substantial gains on reasoning tasks. Our study presents a scalable and effective solution to train LLMs for creativity.

Problem

Research questions and friction points this paper is trying to address.

creativity

large language models

word association

reinforcement learning

verifiable rewards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Codenames

creativity training

Reinforcement Learning with Verifiable Rewards