Decoding-based Regression

📅 2025-01-31

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work introduces a novel paradigm for regression using causal autoregressive language models (LMs): numerical prediction is framed as generating numeric string tokens (e.g., scientific notation sequences), trained end-to-end via standard token-level cross-entropy loss—unifying point estimation and density estimation. Theoretically, we prove that causal LMs—without architectural or loss-function modifications—can asymptotically approximate the optimal regression estimator and naturally model complex, multimodal, or heavy-tailed target distributions. Empirically, our approach achieves competitive MAE/RMSE on multiple standard tabular regression benchmarks, matching state-of-the-art traditional regressors; in density estimation, it significantly outperforms existing baselines. Our core contributions are twofold: (i) the first rigorous theoretical foundation for decoding-based regression with LMs, and (ii) empirical validation of LMs as general-purpose regression models capable of joint point and distributional prediction.

Technology Category

Application Category

📝 Abstract

Language models have recently been shown capable of performing regression tasks wherein numeric predictions are represented as decoded strings. In this work, we provide theoretical grounds for this capability and furthermore investigate the utility of causal auto-regressive sequence models when they are applied to any feature representation. We find that, despite being trained in the usual way - for next-token prediction via cross-entropy loss - decoding-based regression is as performant as traditional approaches for tabular regression tasks, while being flexible enough to capture arbitrary distributions, such as in the task of density estimation.

Problem

Research questions and friction points this paper is trying to address.

Language Models

Regression Learning

Tabular Data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Autoregressive Language Model

Regression Tasks

Flexible Distribution Adaptation

🔎 Similar Papers

No similar papers found.