Agnostics: Learning to Code in Any Programming Language via Reinforcement with a Universal Learning Environment

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Large language models (LLMs) exhibit weak code generation performance for low-resource programming languages (e.g., Lua, Julia). Method: This paper proposes a language-agnostic reinforcement learning (RL) post-training framework that eliminates language-specific datasets and evaluation tools. Instead, it introduces a universal, behavior-based evaluation mechanism grounded solely in externally validated unit tests with standardized I/O formats, enabling feedback without syntactic or linguistic assumptions. The framework achieves cross-language adaptability via lightweight configuration and the RLVR algorithm, operating within a robust execution environment. Contribution/Results: Experiments demonstrate substantial performance gains for Qwen-3 4B across five low-resource languages. On MultiPL-E and LiveCodeBench, it surpasses same-parameter-scale models and matches the performance of 16B–70B models—marking the first demonstration of efficient multilingual generalization by a compact LLM in code generation.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) already excel at writing code in high-resource languages such as Python and JavaScript, yet stumble on low-resource languages that remain essential to science and engineering. Besides the obvious shortage of pre-training data, post-training itself is a bottleneck: every new language seems to require new datasets, test harnesses, and reinforcement-learning (RL) infrastructure. We introduce Agnostics, a language-agnostic post-training pipeline that eliminates this per-language engineering. The key idea is to judge code solely by its externally observable behavior, so a single verifier can test solutions written in any language. Concretely, we (i) use an LLM to rewrite existing unit-test datasets into an I/O format, (ii) supply a short configuration that tells the verifier how to compile and run a target language, and (iii) apply reinforcement learning with verifiable rewards (RLVR) in a robust code execution environment. Applied to five low-resource languages--Lua, Julia, R, OCaml, and Fortran--Agnostics (1) improves Qwen-3 4B to performance that rivals other 16B-70B open-weight models; (2) scales cleanly to larger and diverse model families (Qwen-3 8B, DeepSeek Coder 6.7B Instruct, Phi 4 Mini); and (3) for ${le} 16$B parameter models, sets new state-of-the-art pass@1 results on MultiPL-E and a new multi-language version LiveCodeBench that we introduce. We will release the language-agnostic training datasets (Ag-MBPP-X, Ag-Codeforces-X, Ag-LiveCodeBench-X), training code, and ready-to-use configurations, making RL post-training in any programming language as simple as editing a short YAML file.

Problem

Research questions and friction points this paper is trying to address.

Enables learning to code in low-resource languages via universal RL

Eliminates per-language engineering for post-training LLMs

Improves model performance on diverse low-resource programming languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Universal verifier tests code in any language

Rewrites unit tests into I/O format

Reinforcement learning with verifiable rewards

🔎 Similar Papers

No similar papers found.