Noise, Adaptation, and Strategy: Assessing LLM Fidelity in Decision-Making

📅 2025-08-21

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This study investigates the behavioral fidelity limitations of large language models (LLMs) in simulating human decision variability and adaptability. Method: We propose a process-oriented, three-tier intervention evaluation framework—spanning intrinsic behavior, instruction-based prompting, and behavioral imitation—and conduct behavioral experiments using second-price auctions and the newsvendor problem to systematically assess LLM agents’ capacity for dynamic strategy evolution under noise and external guidance. Contribution/Results: We first identify that LLMs default to highly conservative, low-variability strategies, markedly diverging from well-documented human cognitive biases and strategic diversity. While incorporating human behavioral data partially mitigates this gap, it fails to eliminate fundamental discrepancies. Our work exposes intrinsic behavioral constraints of LLMs as social-scientific research agents and establishes a scalable, multi-level evaluation paradigm for developing high-fidelity cognitive agents.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly used in social science simulations. While their performance on reasoning and optimization tasks has been extensively evaluated, less attention has been paid to their ability to simulate human decision-making's variability and adaptability. We propose a process-oriented evaluation framework with progressive interventions (Intrinsicality, Instruction, and Imitation) to examine how LLM agents adapt under different levels of external guidance and human-derived noise. We validate the framework on two classic economics tasks, irrationality in the second-price auction and decision bias in the newsvendor problem, showing behavioral gaps between LLMs and humans. We find that LLMs, by default, converge on stable and conservative strategies that diverge from observed human behaviors. Risk-framed instructions impact LLM behavior predictably but do not replicate human-like diversity. Incorporating human data through in-context learning narrows the gap but fails to reach human subjects' strategic variability. These results highlight a persistent alignment gap in behavioral fidelity and suggest that future LLM evaluations should consider more process-level realism. We present a process-oriented approach for assessing LLMs in dynamic decision-making tasks, offering guidance for their application in synthetic data for social science research.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM adaptability in human decision-making simulations

Assessing behavioral gaps between LLM and human decision strategies

Developing process-oriented framework for LLM fidelity evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive intervention evaluation framework

Human-derived noise incorporation method

In-context learning adaptation technique

🔎 Similar Papers

Does Alignment Tuning Really Break LLMs' Internal Confidence?