🤖 AI Summary
This work addresses the challenge of context-dependent, safety-critical control in complex world models lacking explicit dynamics by proposing a penalty-based predictive control (PPC) framework grounded in online Riemannian optimization. Leveraging feasibility samples generated from a black-box simulator, the method employs score-density-induced Riemannian geometry over the action space to guide gradient descent. It adaptively sets safety margins and convergence rates using the curvature of the conditional log-density, denoted κ(ξₜ), thereby replacing conventional conservative strategies that rely on unknown Lipschitz constants. This approach establishes theoretically grounded, context-aware safety guarantees. Experimental results demonstrate that the proposed method significantly outperforms marginal and frozen-density baselines in dynamic navigation tasks, exhibiting superior robustness and adaptability—particularly following abrupt environmental changes.
📝 Abstract
Modern world models are becoming too complex to admit explicit dynamical descriptions. We study safety-critical contextual control, where a Planner must optimize a task objective using only feasibility samples from a black-box Simulator, conditioned on a context signal $ξ_t$. We develop a sample-based Penalized Predictive Control (PPC) framework grounded in online Riemannian optimization, in which the Simulator compresses the feasibility manifold into a score-based density $\hat{p}(u \mid ξ_t)$ that endows the action space with a Riemannian geometry guiding the Planner's gradient descent. The barrier curvature $κ(ξ_t)$, the minimum curvature of the conditional log-density $-\ln\hat{p}(\cdot\midξ_t)$, governs both convergence rate and safety margin, replacing the Lipschitz constant of the unknown dynamics. Our main result is a contextual safety bound showing that the distance from the true feasibility manifold is controlled by the score estimation error and a ratio that depends on $κ(ξ_t)$, both of which improve with richer context. Simulations on a dynamic navigation task confirm that contextual PPC substantially outperforms marginal and frozen density models, with the advantage growing after environment shifts.