π€ AI Summary
This work addresses the limitation of large language models (LLMs) in text-to-price prediction tasksβnamely, their exclusive output of point estimates without uncertainty quantification. We propose the first end-to-end quantile regression framework tailored for LLMs, enabling full predictive distribution estimation. Methodologically, we fine-tune Mistral-7B with a multi-head quantile regression head; introduce distribution calibration evaluation using CRPS, DSS, and WIS; design an LLM-assisted noise label cleaning mechanism for unbiased, human-level label correction; and establish a cross-dataset scalable training paradigm. Key findings: Mistral-7B consistently outperforms encoder-based architectures and embedding-based methods in both point estimation accuracy and distribution calibration. On three diverse price prediction benchmarks, it reduces point estimation error by 19β34% and improves distribution calibration metrics by 27β41%. The code and a high-quality benchmark dataset are publicly released.
π Abstract
Large Language Models (LLMs) have shown promise in structured prediction tasks, including regression, but existing approaches primarily focus on point estimates and lack systematic comparison across different methods. We investigate probabilistic regression using LLMs for unstructured inputs, addressing challenging text-to-distribution prediction tasks such as price estimation where both nuanced text understanding and uncertainty quantification are critical. We propose a novel quantile regression approach that enables LLMs to produce full predictive distributions, improving upon traditional point estimates. Through extensive experiments across three diverse price prediction datasets, we demonstrate that a Mistral-7B model fine-tuned with quantile heads significantly outperforms traditional approaches for both point and distributional estimations, as measured by three established metrics each for prediction accuracy and distributional calibration. Our systematic comparison of LLM approaches, model architectures, training approaches, and data scaling reveals that Mistral-7B consistently outperforms encoder architectures, embedding-based methods, and few-shot learning methods. Our experiments also reveal the effectiveness of LLM-assisted label correction in achieving human-level accuracy without systematic bias. Our curated datasets are made available at https://github.com/vnik18/llm-price-quantile-reg/ to support future research.