Regression Language Models for Code

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the open challenge of predicting code-execution numerical outcomes—such as memory consumption, latency, and model accuracy. We propose the Regression Language Model (RLM), the first unified end-to-end regression model capable of cross-language (17 languages from CodeNet), cross-hardware-platform, and cross-task (performance and accuracy) prediction. Initialized from T5-Gemma, RLM eliminates hand-crafted feature engineering and directly regresses multi-dimensional runtime metrics from raw source-code text. On the APPS dataset, it achieves a Spearman correlation coefficient exceeding 0.9; on the CodeNet multilingual benchmark, its average Spearman correlation reaches 0.51; and across five NAS search spaces, its Kendall–Tau correlation peaks at 0.46—consistently outperforming graph neural network baselines.

Technology Category

Application Category

📝 Abstract

We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to heavy and domain-specific feature engineering, we show that a single unified Regression Language Model (RLM) can simultaneously predict directly from text, (i) the memory footprint of code across multiple high-level languages such as Python and C++, (ii) the latency of Triton GPU kernels, and (iii) the accuracy and speed of trained neural networks represented in ONNX. In particular, a relatively small 300M parameter RLM initialized from T5Gemma, obtains > 0.9 Spearman-rank on competitive programming submissions from APPS, and a single unified model achieves > 0.5 average Spearman-rank across 17 separate languages from CodeNet. Furthermore, the RLM can obtain the highest average Kendall-Tau of 0.46 on five classic NAS design spaces previously dominated by graph neural networks, and simultaneously predict architecture latencies on numerous hardware platforms.

Problem

Research questions and friction points this paper is trying to address.

Predicting numeric outcomes of code executions from text

Estimating memory footprint across multiple programming languages

Simultaneously forecasting latency and performance of neural networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified regression language model predicts code metrics

Model handles multiple languages and execution outcomes

Small 300M parameter model achieves competitive performance

🔎 Similar Papers

Do Current Language Models Support Code Intelligence for R Programming Language?