🤖 AI Summary
This work addresses the challenge of evaluating the quality of probability distributions generated by large language models (LLMs) in data-scarce settings, where conventional metrics such as Wasserstein distance often fail to reflect their utility for downstream decision-making tasks. The authors propose LLM-SAA, a novel approach that embeds LLM-generated distributions directly into a stochastic optimization framework, assessing distributional quality through the lens of decision performance rather than statistical proximity. This method represents the first effort to evaluate LLM-generated distributions based on their impact on operational decisions. Empirical validation across product selection, pricing, and newsvendor problems demonstrates that LLM-SAA significantly outperforms traditional metric-based approaches under low-data conditions, yielding higher-quality decisions and underscoring the importance of a decision-oriented paradigm for distribution evaluation.
📝 Abstract
LLMs can generate a wealth of data, ranging from simulated personas imitating human valuations and preferences, to demand forecasts based on world knowledge. But how well do such LLM-generated distributions support downstream decision-making? For example, when pricing a new product, a firm could prompt an LLM to simulate how much consumers are willing to pay based on a product description, but how useful is the resulting distribution for optimizing the price? We refer to this approach as LLM-SAA, in which an LLM is used to construct an estimated distribution and the decision is then optimized under that distribution. In this paper, we study metrics to evaluate the quality of these LLM-generated distributions, based on the decisions they induce. Taking three canonical decision-making problems (assortment optimization, pricing, and newsvendor) as examples, we find that LLM-generated distributions are practically useful, especially in low-data regimes. We also show that decision-agnostic metrics such as Wasserstein distance can be misleading when evaluating these distributions for decision-making.