🤖 AI Summary
Decoding-based regression models numerical prediction as sequence generation, but token-level cross-entropy loss is fundamentally misaligned with continuous target values, causing scale inaccuracy and limited generalization. This paper proposes the first reinforcement learning–based decoding framework for regression: it formalizes the generation process as a Markov decision process and replaces conventional token-level supervision with sequence-level numerical consistency rewards—e.g., the reciprocal of absolute error. We integrate ReMax and GRPO to optimize large language model (LLM) policies for numerical generation. Evaluated on tabular data regression and code metric regression tasks, our method significantly outperforms state-of-the-art approaches, achieving a 12.7% average reduction in MAE, 2.3× faster sampling efficiency, and enhanced cross-domain generalization stability. To our knowledge, this is the first work to demonstrate systematic performance gains from sequence-level reward optimization in decoding-based regression.
📝 Abstract
Decoding-based regression, which reformulates regression as a sequence generation task, has emerged as a promising paradigm of applying large language models for numerical prediction. However, its progress is hindered by the misalignment between discrete token-level objectives (e.g., cross-entropy) and continuous numerical values. Existing approaches relying on token-level constraints often fail to capture the global magnitude of the target value, limiting their precision and generalization. In this paper, we propose to unlock the potential of decoding-based regression via Reinforcement Learning (RL). We formulate the generation process as a Markov Decision Process, utilizing sequence-level rewards to enforce global numerical coherence. Extensive experiments on tabular regression and code metric regression demonstrate that our method (specifically with ReMax and GRPO) consistently outperforms both state-of-the-art token-level baselines and traditional regression heads, showing the superiority of introducing sequence-level signals. Our analysis further reveals that RL significantly enhances sampling efficiency and predictive precision, establishing decoding-based regression as a robust and accurate paradigm for general-purpose numerical prediction.