LLM REgression with a Latent Iterative State Head

📅 2026-04-01

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This work addresses the inefficiency, excessive parameter overhead, or insufficient accuracy of large language models (LLMs) in text regression tasks by proposing RELISH, a novel architecture that introduces learnable latent states atop a frozen LLM backbone. These latent states are iteratively refined via cross-attention mechanisms and ultimately fed into a lightweight linear regression head to produce scalar predictions. RELISH is the first approach to integrate iterative latent state modeling with cross-attention for LLM-based regression. With only a marginal increase of 0.01–0.04% in trainable parameters (approximately 3.4–3.7 million), it consistently outperforms existing methods across five datasets, four distinct LLM backbones, and two training paradigms, achieving substantial performance gains while maintaining minimal computational overhead.

Technology Category

Application Category

📝 Abstract

We present RELISH (REgression with a Latent Iterative State Head), a novel, lightweight architecture designed for text regression with large language models. Rather than decoding numeric targets as text or aggregating multiple generated outputs, RELISH predicts scalar values directly from frozen LLM representations by iteratively refining a learned latent state through cross-attention over token-level representations, and then mapping the final state to a point estimate with a linear regressor. Across five datasets, four LLM backbones, and two LLM training regimes, RELISH consistently outperforms prior baselines from all three major LLM regression families, including autoregressive decoding, regression-aware inference, and existing predictive head methods. Despite these gains, RELISH remains highly parameter-efficient, requiring only 3.4-3.7M trainable parameters across frozen LLM backbones (only 0.01-0.04% additional overhead), far less than LoRA-based alternatives that grow with model size (0.26-0.42%).

Problem

Research questions and friction points this paper is trying to address.

LLM regression

text regression

scalar prediction

parameter efficiency

latent state

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent iterative state

text regression

frozen LLM