🤖 AI Summary
This work addresses the real-time fault-avoidance challenge for robots operating in out-of-distribution (OOD) hazardous environments. We propose a novel paradigm that tightly integrates low-frequency multimodal large language model (MLLM) reasoning with dynamics-aware motion planning. Instead of relying on hand-crafted fallback policies, our method employs an online vision-language model to semantically identify unsafe regions, augments runtime monitoring to anticipate failure modes, and—upon triggering—generates dynamically feasible, semantically safe motion fallback trajectories. Our key contribution is the first tight coupling of open-world multimodal reasoning with real-time motion planning under physical constraints. Evaluated on synthetic benchmarks, physical ANYmal quadruped platforms, and urban quadrotor navigation tasks, our approach achieves significant improvements in OOD safety classification accuracy and planning success rate, demonstrating both enhanced safety guarantees and strong generalization across unseen scenarios.
📝 Abstract
Foundation models can provide robust high-level reasoning on appropriate safety interventions in hazardous scenarios beyond a robot's training data, i.e. out-of-distribution (OOD) failures. However, due to the high inference latency of Large Vision and Language Models, current methods rely on manually defined intervention policies to enact fallbacks, thereby lacking the ability to plan generalizable, semantically safe motions. To overcome these challenges we present FORTRESS, a framework that generates and reasons about semantically safe fallback strategies in real time to prevent OOD failures. At a low frequency in nominal operations, FORTRESS uses multi-modal reasoners to identify goals and anticipate failure modes. When a runtime monitor triggers a fallback response, FORTRESS rapidly synthesizes plans to fallback goals while inferring and avoiding semantically unsafe regions in real time. By bridging open-world, multi-modal reasoning with dynamics-aware planning, we eliminate the need for hard-coded fallbacks and human safety interventions. FORTRESS outperforms on-the-fly prompting of slow reasoning models in safety classification accuracy on synthetic benchmarks and real-world ANYmal robot data, and further improves system safety and planning success in simulation and on quadrotor hardware for urban navigation.