Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the collapse of curriculum diversity in co-evolutionary language models, where problem generators often converge to narrow distributions that impede sustained learning by solvers. To counter this, the authors propose a lexical dropout mechanism that applies random, hard, and non-stationary vocabulary masks to the generator’s output logits, serving as an explicit constraint on the action space—akin to rule enforcement in self-play settings. Implemented within the R-Zero framework using a Qwen3-4B generator and a Qwen3-8B solver, the approach yields consistent gains in mathematical reasoning: the 8B solver improves by 4.4 points on average, with pronounced advances on competition-level benchmarks, while maintaining generation diversity across lexical, semantic, and functional dimensions throughout training.

Technology Category

Application Category

📝 Abstract

Co-evolutionary self-play, where one language model generates problems and another solves them, promises autonomous curriculum learning without human supervision. In practice, the proposer quickly converges to a narrow distribution of problems that satisfy the reward function. This diversity collapse renders the curriculum uninformative for the solver, stalling the co-evolutionary loop. We introduce vocabulary dropout, a random mask applied to the proposer's output logits during both policy training and curriculum generation, as a lightweight mechanism to sustain diversity. The mask is hard and non-stationary, preventing the proposer from locking into fixed token sequences. Training Qwen3-4B and Qwen3-8B on mathematical reasoning via R-Zero, we find that vocabulary dropout sustains proposer diversity across lexical, semantic, and functional metrics throughout training, and yields solver improvements averaging +4.4 points at 8B, with the largest gains on competition-level benchmarks. Our findings suggest that explicit action-space constraints, analogous to the structural role that game rules play in classical self-play, can help sustain productive co-evolution in language. Vocabulary dropout is one simple instantiation of this principle.

Problem

Research questions and friction points this paper is trying to address.

co-evolution

curriculum diversity

vocabulary dropout

self-play

language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

vocabulary dropout

co-evolutionary self-play

curriculum diversity