Evaluating LLM-persona Generated Distributions for Decision-making

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the challenge of evaluating the quality of probability distributions generated by large language models (LLMs) in data-scarce settings, where conventional metrics such as Wasserstein distance often fail to reflect their utility for downstream decision-making tasks. The authors propose LLM-SAA, a novel approach that embeds LLM-generated distributions directly into a stochastic optimization framework, assessing distributional quality through the lens of decision performance rather than statistical proximity. This method represents the first effort to evaluate LLM-generated distributions based on their impact on operational decisions. Empirical validation across product selection, pricing, and newsvendor problems demonstrates that LLM-SAA significantly outperforms traditional metric-based approaches under low-data conditions, yielding higher-quality decisions and underscoring the importance of a decision-oriented paradigm for distribution evaluation.

Technology Category

Application Category

📝 Abstract

LLMs can generate a wealth of data, ranging from simulated personas imitating human valuations and preferences, to demand forecasts based on world knowledge. But how well do such LLM-generated distributions support downstream decision-making? For example, when pricing a new product, a firm could prompt an LLM to simulate how much consumers are willing to pay based on a product description, but how useful is the resulting distribution for optimizing the price? We refer to this approach as LLM-SAA, in which an LLM is used to construct an estimated distribution and the decision is then optimized under that distribution. In this paper, we study metrics to evaluate the quality of these LLM-generated distributions, based on the decisions they induce. Taking three canonical decision-making problems (assortment optimization, pricing, and newsvendor) as examples, we find that LLM-generated distributions are practically useful, especially in low-data regimes. We also show that decision-agnostic metrics such as Wasserstein distance can be misleading when evaluating these distributions for decision-making.

Problem

Research questions and friction points this paper is trying to address.

LLM-generated distributions

decision-making

evaluation metrics

low-data regimes

LLM-SAA

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-SAA

decision-making

distribution evaluation