Beyond Execution: Static-Analysis Rewards and Hint-Conditioned Diffusion RL for Code Generation

๐Ÿ“… 2026-05-16
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

215K/year
๐Ÿค– AI Summary
Diffusion-based language models struggle to effectively learn complex code generation due to sparse execution-based reward signals. This work proposes an execution-free reinforcement learning post-training framework that leverages static program analysis as an efficient, execution-agnostic reward mechanism. To further enhance generation quality on challenging tasks, the approach incorporates a conditional sampling strategy guided by abstract syntax tree (AST) hints. Experimental results demonstrate significant performance gains: the method improves DiffuCoderโ€™s pass@1 score on HumanEval from 53.9 to 67.1 and on LiveCodeBench from 14.9 to 15.5, while simultaneously reducing inference time by 9.4%. These findings highlight a strong correlation between reward design and task difficulty in code generation.
๐Ÿ“ Abstract
Reinforcement Learning (RL) is an important paradigm for aligning Diffusion Language Models (DLMs) toward functional correctness in code generation. However, these models often encounter a ``capability cliff'' on complex tasks, where execution-based semantic rewards become too low to provide a viable learning signal. In this paper, we present a systematic empirical study of RL post-training for diffusion-based code generation along three axes: reward design, hint-conditioned sampling, and task difficulty. We investigate the effectiveness of execution-free rewards as alternatives to traditional unit-test execution, the role of training-time hint-conditioned diffusion sampling in mitigating exploration bottlenecks, and the impact of these design choices varies across tasks with different difficulty levels. Across HumanEval, MBPP, and LiveCodeBench, we find that static checking is the strongest overall standalone execution-free reward in our setting, especially improving DiffuCoder from 53.9 to 67.1 on HumanEval and from 14.9 to 15.5 on LiveCodeBench while reducing rollout time by 9.4\%. We further find that moderate AST-based hinting is most useful on harder benchmarks, while the best reward design depends strongly on task difficulty: similarity-based rewards are more effective on easier subsets, whereas static checking is more reliable on harder subsets where execution rewards are low. These findings suggest that reward design and training guidance substantially affect diffusion RL performance in our evaluated code-generation setting.
Problem

Research questions and friction points this paper is trying to address.

code generation
reinforcement learning
execution-free rewards
diffusion language models
capability cliff
Innovation

Methods, ideas, or system contributions that make the work stand out.

static-analysis rewards
hint-conditioned diffusion
execution-free reward
diffusion RL
code generation
๐Ÿ”Ž Similar Papers
No similar papers found.