Text-to-Distribution Prediction with Quantile Tokens and Neighbor Context

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work addresses the limitations of current large language models in text-based regression tasks, which struggle to accurately model the full conditional distribution and lack direct input-output associations as well as local contextual support. To overcome these challenges, the authors propose an end-to-end distribution prediction framework based on quantile tokens. Specifically, they introduce dedicated quantile tokens that establish direct pathways between inputs and quantile outputs via self-attention mechanisms, while incorporating retrieval-augmented empirical distributions from semantically similar neighbors for supervision. Theoretical analysis of the quantile regression loss function is provided, and experiments on the Inside Airbnb and StackSample datasets demonstrate significant improvements: compared to baselines, the method reduces MAPE by approximately 4 percentage points and narrows prediction intervals by a factor of two, with particularly notable gains in low-data or high-difficulty scenarios.

Technology Category

Application Category

📝 Abstract

Many applications of LLM-based text regression require predicting a full conditional distribution rather than a single point value. We study distributional regression under empirical-quantile supervision, where each input is paired with multiple observed quantile outcomes, and the target distribution is represented by a dense grid of quantiles. We address two key limitations of current approaches: the lack of local grounding for distribution estimates, and the reliance on shared representations that create an indirect bottleneck between inputs and quantile outputs. In this paper, we introduce Quantile Token Regression, which, to our knowledge, is the first work to insert dedicated quantile tokens into the input sequence, enabling direct input-output pathways for each quantile through self-attention. We further augment these quantile tokens with retrieval, incorporating semantically similar neighbor instances and their empirical distributions to ground predictions with local evidence from similar instances. We also provide the first theoretical analysis of loss functions for quantile regression, clarifying which distributional objectives each optimizes. Experiments on the Inside Airbnb and StackSample benchmark datasets with LLMs ranging from 1.7B to 14B parameters show that quantile tokens with neighbors consistently outperform baselines (~4 points lower MAPE and 2x narrower prediction intervals), with especially large gains on smaller and more challenging datasets where quantile tokens produce substantially sharper and more accurate distributions.

Problem

Research questions and friction points this paper is trying to address.

distributional regression

quantile prediction

text-to-distribution

empirical quantiles

conditional distribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantile Token Regression

distributional regression

neighbor retrieval