🤖 AI Summary
This work proposes the first online POMDP planning framework based on Iterated Conditional Value-at-Risk (ICVaR) to address tail risk control in partially observable environments. By introducing a risk parameter α to modulate risk aversion, the authors extend algorithms such as Sparse Sampling, PFT-DPW, and POMCPOW to optimize the ICVaR objective. The key contribution lies in the novel integration of ICVaR—a dynamic risk measure—into online POMDP planning, accompanied by a policy evaluation and exploration mechanism with finite-time performance guarantees whose theoretical bounds are independent of the action space size. Experimental results on standard POMDP benchmarks demonstrate that the proposed ICVaR planner significantly outperforms risk-neutral approaches by effectively reducing tail risk in policy returns.
📝 Abstract
We study risk-sensitive planning under partial observability using the dynamic risk measure Iterated Conditional Value-at-Risk (ICVaR). A policy evaluation algorithm for ICVaR is developed with finite-time performance guarantees that do not depend on the cardinality of the action space. Building on this foundation, three widely used online planning algorithms--Sparse Sampling, Particle Filter Trees with Double Progressive Widening (PFT-DPW), and Partially Observable Monte Carlo Planning with Observation Widening (POMCPOW)--are extended to optimize the ICVaR value function rather than the expectation of the return. Our formulations introduce a risk parameter $\alpha$, where $\alpha = 1$ recovers standard expectation-based planning and $\alpha<1$ induces increasing risk aversion. For ICVaR Sparse Sampling, we establish finite-time performance guarantees under the risk-sensitive objective, which further enable a novel exploration strategy tailored to ICVaR. Experiments on benchmark POMDP domains demonstrate that the proposed ICVaR planners achieve lower tail risk compared to their risk-neutral counterparts.