Cash or Comfort? How LLMs Value Your Inconvenience

📅 2025-06-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the reliability of large language models (LLMs) as personal utility proxies under conflicts between financial incentives and subjective discomfort—including physical states (e.g., walking, hunger, pain) and psychological states (e.g., waiting). Using a behavioral economics framework, we design standardized discomfort–compensation trade-off tasks and conduct multi-round prompt engineering with cross-model comparative experiments. For the first time, we systematically quantify monetary valuations of diverse discomfort types across 12 mainstream LLMs. Four critical deficiencies emerge: (1) extreme inter-model variance in valuations; (2) high sensitivity of decisions to minor prompt perturbations; (3) widespread acceptance of trivial compensation for substantial discomfort; and (4) unwarranted rejection of zero-cost beneficial actions. These findings demonstrate that current LLMs lack robust, interpretable utility modeling capabilities—rendering them unsuitable for deployment in sensitive, high-stakes personal decision-making contexts.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly proposed as near-autonomous artificial intelligence (AI) agents capable of making everyday decisions on behalf of humans. Although LLMs perform well on many technical tasks, their behaviour in personal decision-making remains less understood. Previous studies have assessed their rationality and moral alignment with human decisions. However, the behaviour of AI assistants in scenarios where financial rewards are at odds with user comfort has not yet been thoroughly explored. In this paper, we tackle this problem by quantifying the prices assigned by multiple LLMs to a series of user discomforts: additional walking, waiting, hunger and pain. We uncover several key concerns that strongly question the prospect of using current LLMs as decision-making assistants: (1) a large variance in responses between LLMs, (2) within a single LLM, responses show fragility to minor variations in prompt phrasing (e.g., reformulating the question in the first person can considerably alter the decision), (3) LLMs can accept unreasonably low rewards for major inconveniences (e.g., 1 Euro to wait 10 hours), and (4) LLMs can reject monetary gains where no discomfort is imposed (e.g., 1,000 Euro to wait 0 minutes). These findings emphasize the need for scrutiny of how LLMs value human inconvenience, particularly as we move toward applications where such cash-versus-comfort trade-offs are made on users' behalf.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' decision-making in cash-versus-comfort trade-offs
Quantifying LLMs' pricing of user discomforts like pain and waiting
Examining LLMs' inconsistency and irrationality in valuing inconvenience
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantify LLM-assigned prices for user discomforts
Assess variance and fragility in LLM responses
Evaluate LLM reward acceptance for inconveniences
🔎 Similar Papers
No similar papers found.