Online Risk-Averse Planning in POMDPs Using Iterated CVaR Value Function

📅 2026-01-28

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work proposes the first online POMDP planning framework based on Iterated Conditional Value-at-Risk (ICVaR) to address tail risk control in partially observable environments. By introducing a risk parameter α to modulate risk aversion, the authors extend algorithms such as Sparse Sampling, PFT-DPW, and POMCPOW to optimize the ICVaR objective. The key contribution lies in the novel integration of ICVaR—a dynamic risk measure—into online POMDP planning, accompanied by a policy evaluation and exploration mechanism with finite-time performance guarantees whose theoretical bounds are independent of the action space size. Experimental results on standard POMDP benchmarks demonstrate that the proposed ICVaR planner significantly outperforms risk-neutral approaches by effectively reducing tail risk in policy returns.

Technology Category

Application Category

📝 Abstract

We study risk-sensitive planning under partial observability using the dynamic risk measure Iterated Conditional Value-at-Risk (ICVaR). A policy evaluation algorithm for ICVaR is developed with finite-time performance guarantees that do not depend on the cardinality of the action space. Building on this foundation, three widely used online planning algorithms--Sparse Sampling, Particle Filter Trees with Double Progressive Widening (PFT-DPW), and Partially Observable Monte Carlo Planning with Observation Widening (POMCPOW)--are extended to optimize the ICVaR value function rather than the expectation of the return. Our formulations introduce a risk parameter $\alpha$, where $\alpha = 1$ recovers standard expectation-based planning and $\alpha<1$ induces increasing risk aversion. For ICVaR Sparse Sampling, we establish finite-time performance guarantees under the risk-sensitive objective, which further enable a novel exploration strategy tailored to ICVaR. Experiments on benchmark POMDP domains demonstrate that the proposed ICVaR planners achieve lower tail risk compared to their risk-neutral counterparts.

Problem

Research questions and friction points this paper is trying to address.

risk-averse planning

POMDPs

Conditional Value-at-Risk

online planning

tail risk

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterated CVaR

risk-averse planning

POMDP