Online Risk-Averse Planning in POMDPs Using Iterated CVaR Value Function

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first online POMDP planning framework based on Iterated Conditional Value-at-Risk (ICVaR) to address tail risk control in partially observable environments. By introducing a risk parameter α to modulate risk aversion, the authors extend algorithms such as Sparse Sampling, PFT-DPW, and POMCPOW to optimize the ICVaR objective. The key contribution lies in the novel integration of ICVaR—a dynamic risk measure—into online POMDP planning, accompanied by a policy evaluation and exploration mechanism with finite-time performance guarantees whose theoretical bounds are independent of the action space size. Experimental results on standard POMDP benchmarks demonstrate that the proposed ICVaR planner significantly outperforms risk-neutral approaches by effectively reducing tail risk in policy returns.

Technology Category

Application Category

📝 Abstract
We study risk-sensitive planning under partial observability using the dynamic risk measure Iterated Conditional Value-at-Risk (ICVaR). A policy evaluation algorithm for ICVaR is developed with finite-time performance guarantees that do not depend on the cardinality of the action space. Building on this foundation, three widely used online planning algorithms--Sparse Sampling, Particle Filter Trees with Double Progressive Widening (PFT-DPW), and Partially Observable Monte Carlo Planning with Observation Widening (POMCPOW)--are extended to optimize the ICVaR value function rather than the expectation of the return. Our formulations introduce a risk parameter $\alpha$, where $\alpha = 1$ recovers standard expectation-based planning and $\alpha<1$ induces increasing risk aversion. For ICVaR Sparse Sampling, we establish finite-time performance guarantees under the risk-sensitive objective, which further enable a novel exploration strategy tailored to ICVaR. Experiments on benchmark POMDP domains demonstrate that the proposed ICVaR planners achieve lower tail risk compared to their risk-neutral counterparts.
Problem

Research questions and friction points this paper is trying to address.

risk-averse planning
POMDPs
Conditional Value-at-Risk
online planning
tail risk
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterated CVaR
risk-averse planning
POMDP
online planning
finite-time guarantees
🔎 Similar Papers
No similar papers found.
Y
Yaacov Pariente
Faculty of Mathematics, Technion - Israel Institute of Technology
Vadim Indelman
Vadim Indelman
Associate Professor, Technion
RoboticsPerception/SLAMPOMDP/Belief space planningAIMulti-robot systems