🤖 AI Summary
This study presents the first systematic approach to inferring willingness-to-pay (WTP) from large language models (LLMs) in subjective choice scenarios lacking objectively correct answers. By presenting travel-assistance dilemmas and leveraging multinomial logit modeling, role-based prompting, and lightweight preference conditioning, the authors quantify the implicit WTP embedded in LLM responses and benchmark it against human data. The findings reveal that larger-scale LLMs can generate meaningful WTP estimates, yet they consistently exhibit attribute-level systematic biases, generally overestimating human WTP. Notably, incorporating minimal user preference conditioning substantially improves the alignment between LLM-derived valuations and actual human behavior, enhancing the behavioral fidelity of LLM-based economic inference.
📝 Abstract
As Large Language Models (LLMs) are increasingly deployed in applications such as travel assistance and purchasing support, they are often required to make subjective choices on behalf of users in settings where no objectively correct answer exists. We study LLM decision-making in a travel-assistant context by presenting models with choice dilemmas and analyzing their responses using multinomial logit models to derive implied willingness to pay (WTP) estimates. These WTP values are subsequently compared to human benchmark values from the economics literature. In addition to a baseline setting, we examine how model behavior changes under more realistic conditions, including the provision of information about users'past choices and persona-based prompting. Our results show that while meaningful WTP values can be derived for larger LLMs, they also display systematic deviations at the attribute level. Additionally, they tend to overestimate human WTP overall, particularly when expensive options or business-oriented personas are introduced. Conditioning models on prior preferences for cheaper options yields valuations that are closer to human benchmarks. Overall, our findings highlight both the potential and the limitations of using LLMs for subjective decision support and underscore the importance of careful model selection, prompt design, and user representation when deploying such systems in practice.