Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning

📅 2025-12-06

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Decoding-based regression models numerical prediction as sequence generation, but token-level cross-entropy loss is fundamentally misaligned with continuous target values, causing scale inaccuracy and limited generalization. This paper proposes the first reinforcement learning–based decoding framework for regression: it formalizes the generation process as a Markov decision process and replaces conventional token-level supervision with sequence-level numerical consistency rewards—e.g., the reciprocal of absolute error. We integrate ReMax and GRPO to optimize large language model (LLM) policies for numerical generation. Evaluated on tabular data regression and code metric regression tasks, our method significantly outperforms state-of-the-art approaches, achieving a 12.7% average reduction in MAE, 2.3× faster sampling efficiency, and enhanced cross-domain generalization stability. To our knowledge, this is the first work to demonstrate systematic performance gains from sequence-level reward optimization in decoding-based regression.

Technology Category

Application Category

📝 Abstract

Decoding-based regression, which reformulates regression as a sequence generation task, has emerged as a promising paradigm of applying large language models for numerical prediction. However, its progress is hindered by the misalignment between discrete token-level objectives (e.g., cross-entropy) and continuous numerical values. Existing approaches relying on token-level constraints often fail to capture the global magnitude of the target value, limiting their precision and generalization. In this paper, we propose to unlock the potential of decoding-based regression via Reinforcement Learning (RL). We formulate the generation process as a Markov Decision Process, utilizing sequence-level rewards to enforce global numerical coherence. Extensive experiments on tabular regression and code metric regression demonstrate that our method (specifically with ReMax and GRPO) consistently outperforms both state-of-the-art token-level baselines and traditional regression heads, showing the superiority of introducing sequence-level signals. Our analysis further reveals that RL significantly enhances sampling efficiency and predictive precision, establishing decoding-based regression as a robust and accurate paradigm for general-purpose numerical prediction.

Problem

Research questions and friction points this paper is trying to address.

Addresses misalignment between discrete token-level objectives and continuous numerical values in decoding-based regression.

Proposes Reinforcement Learning to enforce global numerical coherence via sequence-level rewards.

Enhances sampling efficiency and predictive precision for general-purpose numerical prediction tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using reinforcement learning for sequence-level rewards

Formulating generation as Markov Decision Process

Enhancing sampling efficiency and predictive precision

🔎 Similar Papers

No similar papers found.