CWM: An Open-Weights LLM for Research on Code Generation with World Models

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

237K/year
🤖 AI Summary
This work addresses the limited reasoning and planning capabilities in code generation by proposing Code World Model (CWM), the first open-source 32B-parameter large language model to integrate the world model paradigm into code intelligence. CWM undergoes mid-training via agent-based Docker environments, jointly executing Python interpreter observations and actions to learn executable trajectories; it further combines supervised fine-tuning with multi-task reinforcement learning—covering verifiable coding, mathematical reasoning, and multi-turn software engineering tasks. Its pure-decoder architecture supports a 131K-context window, enabling stepwise simulation of dynamic execution environments and causal reasoning. Experiments demonstrate substantial improvements over baselines: 65.8% pass@1 on SWE-bench Verified, 68.6% on LiveCodeBench, 96.6% on Math-500, and 76.0% on AIME 2024. These results validate the efficacy of world modeling for code-level reasoning. All training checkpoints are publicly released to advance research in agentic programming.

Technology Category

Application Category

📝 Abstract
We release Code World Model (CWM), a 32-billion-parameter open-weights LLM, to advance research on code generation with world models. To improve code understanding beyond what can be learned from training on static code alone, we mid-train CWM on a large amount of observation-action trajectories from Python interpreter and agentic Docker environments, and perform extensive multi-task reasoning RL in verifiable coding, math, and multi-turn software engineering environments. With CWM, we provide a strong testbed for researchers to explore the opportunities world modeling affords for improving code generation with reasoning and planning in computational environments. We present first steps of how world models can benefit agentic coding, enable step-by-step simulation of Python code execution, and show early results of how reasoning can benefit from the latter. CWM is a dense, decoder-only LLM trained with a context size of up to 131k tokens. Independent of its world modeling capabilities, CWM offers strong performance on general coding and math tasks: it reaches pass@1 scores of 65.8% on SWE-bench Verified (with test-time scaling), 68.6% on LiveCodeBench, 96.6% on Math-500, and 76.0% on AIME 2024. To support further research on code world modeling, we release model checkpoints after mid-training, SFT, and RL.
Problem

Research questions and friction points this paper is trying to address.

Advancing code generation research using world models and large language models
Improving code understanding through interpreter and agentic environment training
Enabling step-by-step code execution simulation and reasoning in computational environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mid-training on Python interpreter trajectories for code understanding
Multi-task reasoning RL in verifiable coding environments
World modeling enables step-by-step Python code execution simulation
🔎 Similar Papers
Q
Quentin Carbonneaux
FAIR CodeGen team
G
Gal Cohen
FAIR CodeGen team
Jonas Gehring
Jonas Gehring
Facebook AI Research
Neural NetworksMachine Learning
Jacob Kahn
Jacob Kahn
FAIR, Meta AI | University of Pennsylvania
machine learningdeep learningartificial intelligence
Jannik Kossen
Jannik Kossen
FAIR, Meta
Felix Kreuk
Felix Kreuk
Meta AI, FAIR
Deep learningMachine learningSpeech ProcessingSecurity
Emily McMilin
Emily McMilin
Machine Learning Engineer, Facebook
Causal InferenceOffline RLNLP
M
Michel Meyer
FAIR CodeGen team
Y
Yuxiang Wei
FAIR CodeGen team
D
David Zhang
FAIR CodeGen team
Kunhao Zheng
Kunhao Zheng
Meta FAIR
Code GenerationReasoningReinforcement LearningLarge Language ModelTheorem Proving
Jordi Armengol-Estapé
Jordi Armengol-Estapé
FAIR, Meta
Deep LearningNatural Language ProcessingTransformersMachine Learning for CodeHPC
P
Pedram Bashiri
FAIR CodeGen team
Maximilian Beck
Maximilian Beck
ELLIS PhD Student, Institute for Machine Learning, JKU Linz
Machine LearningDeep LearningNatural Language Processing
Pierre Chambon
Pierre Chambon
FAIR, META
Natural Language Processing
A
Abhishek Charnalia
FAIR CodeGen team
C
Chris Cummins
FAIR CodeGen team
J
Juliette Decugis
FAIR CodeGen team
Zacharias V. Fisches
Zacharias V. Fisches
FAIR @ Meta
LLMGraph Neural Networks
François Fleuret
François Fleuret
University of Geneva
machine learning
F
Fabian Gloeckle
FAIR CodeGen team
Alex Gu
Alex Gu
MIT
program synthesismachine learninglarge language modelscode generation
Michael Hassid
Michael Hassid
Meta FAIR, Hebrew University of Jerusalem
Natural Language ProcessingSpeechArtificial Intelligence
Daniel Haziza
Daniel Haziza
Facebook AI Research (FAIR)