Risk Profiling and Modulation for LLMs

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing studies lack a systematic analysis of how post-training processes shape large language models’ (LLMs) risk preferences in uncertain decision-making. Method: We propose the first integrated framework—grounded in behavioral economics and finance—that enables risk characterization, guidance, and controllable modulation of LLMs. Leveraging utility-theoretic modeling, multi-stage comparative analysis (pretraining, instruction tuning, RLHF), prompt engineering, and in-context learning, we quantify risk behavior across training stages. Contribution/Results: We find instruction-tuned models best align with standard expected utility theory, whereas pretrained and RLHF-tuned models exhibit significant deviations. Crucially, post-training—particularly instruction tuning—proves to be the most stable and quantifiably controllable intervention point for calibrating LLM risk preferences. This establishes a novel paradigm for building trustworthy AI systems capable of principled, interpretable, and adjustable decision-making under uncertainty.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly used for decision-making tasks under uncertainty; however, their risk profiles and how they are influenced by prompting and alignment methods remain underexplored. Existing studies have primarily examined personality prompting or multi-agent interactions, leaving open the question of how post-training influences the risk behavior of LLMs. In this work, we propose a new pipeline for eliciting, steering, and modulating LLMs' risk profiles, drawing on tools from behavioral economics and finance. Using utility-theoretic models, we compare pre-trained, instruction-tuned, and RLHF-aligned LLMs, and find that while instruction-tuned models exhibit behaviors consistent with some standard utility formulations, pre-trained and RLHF-aligned models deviate more from any utility models fitted. We further evaluate modulation strategies, including prompt engineering, in-context learning, and post-training, and show that post-training provides the most stable and effective modulation of risk preference. Our findings provide insights into the risk profiles of different classes and stages of LLMs and demonstrate how post-training modulates these profiles, laying the groundwork for future research on behavioral alignment and risk-aware LLM design.

Problem

Research questions and friction points this paper is trying to address.

Investigating risk profiles of LLMs in decision-making under uncertainty

Examining how prompting and alignment methods influence LLM risk behavior

Developing methods to modulate risk preferences in language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposed pipeline for eliciting and modulating LLM risk profiles

Used utility-theoretic models to compare different LLM types

Showed post-training provides most stable risk preference modulation

🔎 Similar Papers

How Ethical Should AI Be? How AI Alignment Shapes the Risk Preferences of LLMs