BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scarcity of high-quality code-documentation pairs in niche programming languages, which limits the performance of large code models. The authors propose a self-supervised reinforcement learning framework that jointly optimizes code and documentation generation through a back-translation mechanism: documentation is first generated from code, then new code is reconstructed from that documentation, with the semantic similarity between the original and reconstructed code serving as an implicit reward signal. Requiring only raw code data and no human annotations, this approach substantially expands the scale of effective training data. Evaluated on a 7B-parameter model, the method achieves pass@1 scores of 83.5% on HumanEval and 81.0% on MBPP, outperforming mainstream open-source baselines, with consistent performance gains observed as both data volume and model size increase.

Technology Category

Application Category

📝 Abstract
Training LLMs for code-related tasks typically depends on high-quality code-documentation pairs, which are costly to curate and often scarce for niche programming languages. We introduce BatCoder, a self-supervised reinforcement learning framework designed to jointly optimize code generation and documentation production. BatCoder employs a back-translation strategy: a documentation is first generated from code, and then the generated documentation is used to reconstruct the original code. The semantic similarity between the original and reconstructed code serves as an implicit reward, enabling reinforcement learning to improve the model's performance both in generating code from documentation and vice versa. This approach allows models to be trained using only code, substantially increasing the available training examples. Evaluated on HumanEval and MBPP with a 7B model, BatCoder achieved 83.5% and 81.0% pass@1, outperforming strong open-source baselines. Moreover, the framework demonstrates consistent scaling with respect to both training corpus size and model capacity.
Problem

Research questions and friction points this paper is trying to address.

code-documentation pairs
data scarcity
programming languages
LLM training
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised learning
back-translation
code-documentation alignment
reinforcement learning
code generation
🔎 Similar Papers
No similar papers found.
J
Jingwen Xu
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
Y
Yiyang Lu
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
Z
Zisu Huang
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
C
Changze Lv
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
Xiaohua Wang
Xiaohua Wang
Fudan University
Machine LearningNatural Language Processing
S
Shizheng Li
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
Zhibo Xu
Zhibo Xu
Fudan University
large language modelsagent rl
Z
Zhengkang Guo
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
Z
Zhengyuan Wang
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
M
Muzhao Tian
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
X
Xuanjing Huang
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
Xiaoqing Zheng
Xiaoqing Zheng
Fudan University
Natural Language Processing and Machine Learning