🤖 AI Summary
In dense, dynamic crowd environments, global path planning often fails, while local planners struggle to balance real-time performance and long-horizon rationality. To address this, we propose a two-stage local navigation framework that avoids explicit obstacle prediction. First, a conditional Vector Quantized Variational Autoencoder (VQ-VAE) models expert trajectory priors from raw perception inputs, enabling robust initialization. Second, lightweight runtime trajectory refinement is performed via sampling-based optimization—specifically Model Predictive Path Integral (MPPI) or Cross-Entropy Method (CEM). This end-to-end learning-and-optimization co-design enhances both planning quality and environmental adaptability. Experiments demonstrate a 40% improvement in task success rate and a 6% reduction in traversal time over DRL-VO. Crucially, the method maintains interaction-level real-time performance (>10 Hz) and high robustness even under sudden scene layout changes.
📝 Abstract
Navigation amongst densely packed crowds remains a challenge for mobile robots. The complexity increases further if the environment layout changes, making the prior computed global plan infeasible. In this paper, we show that it is possible to dramatically enhance crowd navigation by just improving the local planner. Our approach combines generative modelling with inference time optimization to generate sophisticated long-horizon local plans at interactive rates. More specifically, we train a Vector Quantized Variational AutoEncoder to learn a prior over the expert trajectory distribution conditioned on the perception input. At run-time, this is used as an initialization for a sampling-based optimizer for further refinement. Our approach does not require any sophisticated prediction of dynamic obstacles and yet provides state-of-the-art performance. In particular, we compare against the recent DRL-VO approach and show a 40% improvement in success rate and a 6% improvement in travel time.