Roll the dice&look before you leap: Going beyond the creative limits of next-token prediction

📅 2025-04-21

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

Large language models (LLMs) exhibit myopia, over-reliance on memorized patterns, and limited diversity in open-ended creative tasks—such as analogical reasoning, knowledge graph link prediction, mathematical problem solving, and protein design—due to their inherent dependence on next-token prediction. Method: We introduce three innovations: (1) a lightweight algorithmic benchmark suite that systematically quantifies LLM limitations in long-horizon planning and original generation; (2) a hash-based input-layer noise injection mechanism—replacing conventional temperature sampling—to enhance exploratory behavior; and (3) empirical validation that multi-token modeling paradigms—including self-supervised and diffusion-based approaches—outperform autoregressive modeling for creative generation. Contribution/Results: Experiments demonstrate significant improvements over standard Transformers in generative diversity, originality, and long-range structural coherence. The benchmark framework and implementation code are publicly released.

Technology Category

Application Category

📝 Abstract

We design a suite of minimal algorithmic tasks that are a loose abstraction of open-ended real-world tasks. This allows us to cleanly and controllably quantify the creative limits of the present-day language model. Much like real-world tasks that require a creative, far-sighted leap of thought, our tasks require an implicit, open-ended stochastic planning step that either (a) discovers new connections in an abstract knowledge graph (like in wordplay, drawing analogies, or research) or (b) constructs new patterns (like in designing math problems or new proteins). In these tasks, we empirically and conceptually argue how next-token learning is myopic and memorizes excessively; comparatively, multi-token approaches, namely teacherless training and diffusion models, excel in producing diverse and original output. Secondly, in our tasks, we find that to elicit randomness from the Transformer without hurting coherence, it is better to inject noise right at the input layer (via a method we dub hash-conditioning) rather than defer to temperature sampling from the output layer. Thus, our work offers a principled, minimal test-bed for analyzing open-ended creative skills, and offers new arguments for going beyond next-token learning and softmax-based sampling. We make part of the code available under https://github.com/chenwu98/algorithmic-creativity

Problem

Research questions and friction points this paper is trying to address.

Quantify creative limits of language models

Compare next-token vs multi-token approaches

Improve randomness without losing coherence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-token approaches enhance diverse output

Hash-conditioning injects noise at input layer

Tasks quantify creative limits of models

🔎 Similar Papers

Divergent Creativity in Humans and Large Language Models