Risk-Averse Learning with Varying Risk Levels

📅 2025-12-28

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This paper studies safety-critical online optimization under time-varying risk preferences and non-stationary environments, using Conditional Value-at-Risk (CVaR) as the risk measure. To capture the dynamics of risk preference, we introduce the novel notion of *risk-level drift*. We develop a unified dynamic regret framework applicable to both first-order gradient and zero-order stochastic feedback settings. Our algorithm jointly accounts for function variation and risk-level drift, yielding a tight dynamic regret bound. Theoretically, we establish its adaptivity to both environmental non-stationarity and risk-sensitivity. Numerical experiments demonstrate its effectiveness in simultaneously handling risk aversion and environmental shifts.

Technology Category

Application Category

📝 Abstract

In safety-critical decision-making, the environment may evolve over time, and the learner adjusts its risk level accordingly. This work investigates risk-averse online optimization in dynamic environments with varying risk levels, employing Conditional Value-at-Risk (CVaR) as the risk measure. To capture the dynamics of the environment and risk levels, we employ the function variation metric and introduce a novel risk-level variation metric. Two information settings are considered: a first-order scenario, where the learner observes both function values and their gradients; and a zeroth-order scenario, where only function evaluations are available. For both cases, we develop risk-averse learning algorithms with a limited sampling budget and analyze their dynamic regret bounds in terms of function variation, risk-level variation, and the total number of samples. The regret analysis demonstrates the adaptability of the algorithms in non-stationary and risk-sensitive settings. Finally, numerical experiments are presented to demonstrate the efficacy of the methods.

Problem

Research questions and friction points this paper is trying to address.

Develop risk-averse online optimization algorithms for dynamic environments with varying risk levels.

Analyze dynamic regret bounds using function variation and risk-level variation metrics.

Design algorithms for both first-order and zeroth-order information settings with limited sampling.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Conditional Value-at-Risk for risk measurement

Introduces risk-level variation metric for dynamic environments

Develops algorithms with limited sampling for both gradient and function-only settings

🔎 Similar Papers

Uncertainty-aware Distributional Offline Reinforcement Learning