🤖 AI Summary
This paper studies safety-critical online optimization under time-varying risk preferences and non-stationary environments, using Conditional Value-at-Risk (CVaR) as the risk measure. To capture the dynamics of risk preference, we introduce the novel notion of *risk-level drift*. We develop a unified dynamic regret framework applicable to both first-order gradient and zero-order stochastic feedback settings. Our algorithm jointly accounts for function variation and risk-level drift, yielding a tight dynamic regret bound. Theoretically, we establish its adaptivity to both environmental non-stationarity and risk-sensitivity. Numerical experiments demonstrate its effectiveness in simultaneously handling risk aversion and environmental shifts.
📝 Abstract
In safety-critical decision-making, the environment may evolve over time, and the learner adjusts its risk level accordingly. This work investigates risk-averse online optimization in dynamic environments with varying risk levels, employing Conditional Value-at-Risk (CVaR) as the risk measure. To capture the dynamics of the environment and risk levels, we employ the function variation metric and introduce a novel risk-level variation metric. Two information settings are considered: a first-order scenario, where the learner observes both function values and their gradients; and a zeroth-order scenario, where only function evaluations are available. For both cases, we develop risk-averse learning algorithms with a limited sampling budget and analyze their dynamic regret bounds in terms of function variation, risk-level variation, and the total number of samples. The regret analysis demonstrates the adaptability of the algorithms in non-stationary and risk-sensitive settings. Finally, numerical experiments are presented to demonstrate the efficacy of the methods.