🤖 AI Summary
Real-time robot motion planning in dynamic human environments faces the challenge of simultaneously modeling uncertainty, strictly satisfying safety and kinematic constraints, and generating diverse, context-appropriate behaviors.
Method: This paper proposes a unified generative-optimization co-design framework. Its core innovation is a bidirectional feedback loop between a reward-guided Conditional Flow Matching (CFM) model and a Model Predictive Path Integral (MPPI) controller: CFM generates diverse, prior-informed initial trajectory proposals, while MPPI performs physics-based, constraint-aware trajectory optimization. Crucially, the two components interact via differentiable closed-loop integration, enabling synergistic trade-offs between generative diversity and optimization-driven safety.
Results: Evaluated on autonomous social navigation tasks, the method achieves significant improvements in planning latency, safety compliance, and robustness to environmental dynamics—demonstrating effectiveness in complex, real-world human-centric scenarios.
📝 Abstract
Planning safe and effective robot behavior in dynamic, human-centric environments remains a core challenge due to the need to handle uncertainty, adapt in real-time, and ensure safety. Optimization-based planners offer explicit constraint handling but rely on oversimplified initialization, reducing solution quality. Learning-based planners better capture multimodal possible solutions but struggle to enforce constraints such as safety. In this paper, we introduce a unified generation-refinement framework bridging learning and optimization with a novel reward-guided conditional flow matching (CFM) model and model predictive path integral (MPPI) control. Our key innovation is in the incorporation of a bidirectional information exchange: samples from a reward-guided CFM model provide informed priors for MPPI refinement, while the optimal trajectory from MPPI warm-starts the next CFM generation. Using autonomous social navigation as a motivating application, we demonstrate that our approach can flexibly adapt to dynamic environments to satisfy safety requirements in real-time.