Can Large Language Models Capture Human Risk Preferences? A Cross-Cultural Study

📅 2025-06-29

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This study investigates the capacity of large language models (LLMs) to simulate cross-cultural human risk preferences. Using real-world urban transportation survey data from multiple countries, we construct lottery-choice tasks grounded in the Constant Relative Risk Aversion (CRRA) utility framework and incorporate demographic covariates. We evaluate ChatGPT-4o and o1-mini under both Chinese and English prompts, assessing their predictive accuracy for human risk decisions. Results show that both models exhibit significantly higher risk aversion than human participants, with o1-mini achieving superior calibration. Prediction error is markedly larger under Chinese prompts than English ones, confirming a systematic influence of linguistic modality on risk modeling. To our knowledge, this is the first work integrating authentic cross-cultural behavioral data with formal risk theory, exposing critical limitations—particularly cultural insensitivity and language dependency—in LLM-based decision modeling. It establishes a novel benchmark and methodological foundation for trustworthy AI in culturally aware risk assessment.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have made significant strides, extending their applications to dialogue systems, automated content creation, and domain-specific advisory tasks. However, as their use grows, concerns have emerged regarding their reliability in simulating complex decision-making behavior, such as risky decision-making, where a single choice can lead to multiple outcomes. This study investigates the ability of LLMs to simulate risky decision-making scenarios. We compare model-generated decisions with actual human responses in a series of lottery-based tasks, using transportation stated preference survey data from participants in Sydney, Dhaka, Hong Kong, and Nanjing. Demographic inputs were provided to two LLMs -- ChatGPT 4o and ChatGPT o1-mini -- which were tasked with predicting individual choices. Risk preferences were analyzed using the Constant Relative Risk Aversion (CRRA) framework. Results show that both models exhibit more risk-averse behavior than human participants, with o1-mini aligning more closely with observed human decisions. Further analysis of multilingual data from Nanjing and Hong Kong indicates that model predictions in Chinese deviate more from actual responses compared to English, suggesting that prompt language may influence simulation performance. These findings highlight both the promise and the current limitations of LLMs in replicating human-like risk behavior, particularly in linguistic and cultural settings.

Problem

Research questions and friction points this paper is trying to address.

Assess LLMs' ability to simulate human risky decision-making behavior

Compare model-generated decisions with human responses across cultures

Evaluate impact of prompt language on risk preference simulation accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs simulate risky decisions using lottery tasks

CRRA framework analyzes model risk preferences

Multilingual prompts affect model decision accuracy

🔎 Similar Papers

How Ethical Should AI Be? How AI Alignment Shapes the Risk Preferences of LLMs