🤖 AI Summary
Existing studies lack a systematic analysis of how post-training processes shape large language models’ (LLMs) risk preferences in uncertain decision-making. Method: We propose the first integrated framework—grounded in behavioral economics and finance—that enables risk characterization, guidance, and controllable modulation of LLMs. Leveraging utility-theoretic modeling, multi-stage comparative analysis (pretraining, instruction tuning, RLHF), prompt engineering, and in-context learning, we quantify risk behavior across training stages. Contribution/Results: We find instruction-tuned models best align with standard expected utility theory, whereas pretrained and RLHF-tuned models exhibit significant deviations. Crucially, post-training—particularly instruction tuning—proves to be the most stable and quantifiably controllable intervention point for calibrating LLM risk preferences. This establishes a novel paradigm for building trustworthy AI systems capable of principled, interpretable, and adjustable decision-making under uncertainty.
📝 Abstract
Large language models (LLMs) are increasingly used for decision-making tasks under uncertainty; however, their risk profiles and how they are influenced by prompting and alignment methods remain underexplored. Existing studies have primarily examined personality prompting or multi-agent interactions, leaving open the question of how post-training influences the risk behavior of LLMs. In this work, we propose a new pipeline for eliciting, steering, and modulating LLMs' risk profiles, drawing on tools from behavioral economics and finance. Using utility-theoretic models, we compare pre-trained, instruction-tuned, and RLHF-aligned LLMs, and find that while instruction-tuned models exhibit behaviors consistent with some standard utility formulations, pre-trained and RLHF-aligned models deviate more from any utility models fitted. We further evaluate modulation strategies, including prompt engineering, in-context learning, and post-training, and show that post-training provides the most stable and effective modulation of risk preference. Our findings provide insights into the risk profiles of different classes and stages of LLMs and demonstrate how post-training modulates these profiles, laying the groundwork for future research on behavioral alignment and risk-aware LLM design.