Efficient Post-training of LLMs for Code Generation With Offline Reinforcement Learning

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the high inference and verification costs, as well as substantial resource demands, associated with online reinforcement learning in the post-training of large code generation models. It presents the first systematic application of offline reinforcement learning to this setting, enabling efficient post-training using only existing code datasets without requiring expensive online interactions. Experimental results demonstrate that the proposed approach significantly reduces training overhead while effectively enhancing model performance, with particularly notable gains observed on smaller-scale large language models and complex programming tasks.

📝 Abstract

Post-training using online reinforcement learning (RL) is an important training step for LLMs, including code-generating models. However, online RL for code generation involves LLM inference and verification of the generated output, which can take considerable time and resources. In this paper, we explore the application of offline RL to code-generating models by leveraging existing code datasets. Our experiments demonstrate that offline RL is an effective training strategy for improving LLM performance. We show that offline RL can be especially beneficial for small LLMs and challenging coding problems.

Problem

Research questions and friction points this paper is trying to address.

post-training

code generation

offline reinforcement learning

large language models

computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

offline reinforcement learning

code generation

post-training