VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the challenges of task allocation decoupled from local navigation in multi-robot exploration, which often leads to robot clustering, frequent replanning, and redundant coverage. The authors propose a hybrid learning-and-planning framework that tightly couples task assignment with motion execution by introducing “execution fidelity” as a shared estimate of navigability. This fidelity-aware approach integrates Voronoi-based goal assignment with mutual exclusion mechanisms to mitigate inter-robot conflicts. Furthermore, a risk-aware arbitration strategy adaptively fuses A* planning with reinforcement learning to balance global efficiency and local safety. The system supports online self-supervised calibration via pseudo-labeling, enabling adaptation to dynamic obstacle environments without manual parameter tuning. Experiments demonstrate significant improvements in success rate, reduced path length, lower coverage overlap, and robust collision avoidance in both random grid maps and Gazebo-simulated factory scenarios.

Technology Category

Application Category

📝 Abstract

Hierarchical multi-robot exploration commonly decouples frontier allocation from local navigation, which can make the system brittle in dense and dynamic environments. Because the allocator lacks direct awareness of execution difficulty, robots may cluster at bottlenecks, trigger oscillatory replanning, and generate redundant coverage. We propose VORL-EXPLORE, a hybrid learning and planning framework that addresses this limitation through execution fidelity, a shared estimate of local navigability that couples task allocation with motion execution. This fidelity signal is incorporated into a fidelity-coupled Voronoi objective with inter-robot repulsion to reduce contention before it emerges. It also drives a risk-aware adaptive arbitration mechanism between global A* guidance and a reactive reinforcement learning policy, balancing long-range efficiency with safe interaction in confined spaces. The framework further supports online self-supervised recalibration of the fidelity model using pseudo-labels derived from recent progress and safety outcomes, enabling adaptation to non-stationary obstacles without manual risk tuning. We evaluate this capability separately in a dedicated severe-traffic ablation. Extensive experiments in randomized grids and a Gazebo factory scenario show high success rates, shorter path length, lower overlap, and robust collision avoidance. The source code will be made publicly available upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

multi-robot exploration

dynamic environments

task allocation

execution difficulty

navigation

Innovation

Methods, ideas, or system contributions that make the work stand out.

execution fidelity

hybrid learning-planning

multi-robot exploration