VORL-EXPLORE: A Hybrid Learning Planning Approach to Multi-Robot Exploration in Dynamic Environments

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of task allocation decoupled from local navigation in multi-robot exploration, which often leads to robot clustering, frequent replanning, and redundant coverage. The authors propose a hybrid learning-and-planning framework that tightly couples task assignment with motion execution by introducing “execution fidelity” as a shared estimate of navigability. This fidelity-aware approach integrates Voronoi-based goal assignment with mutual exclusion mechanisms to mitigate inter-robot conflicts. Furthermore, a risk-aware arbitration strategy adaptively fuses A* planning with reinforcement learning to balance global efficiency and local safety. The system supports online self-supervised calibration via pseudo-labeling, enabling adaptation to dynamic obstacle environments without manual parameter tuning. Experiments demonstrate significant improvements in success rate, reduced path length, lower coverage overlap, and robust collision avoidance in both random grid maps and Gazebo-simulated factory scenarios.

Technology Category

Application Category

📝 Abstract
Hierarchical multi-robot exploration commonly decouples frontier allocation from local navigation, which can make the system brittle in dense and dynamic environments. Because the allocator lacks direct awareness of execution difficulty, robots may cluster at bottlenecks, trigger oscillatory replanning, and generate redundant coverage. We propose VORL-EXPLORE, a hybrid learning and planning framework that addresses this limitation through execution fidelity, a shared estimate of local navigability that couples task allocation with motion execution. This fidelity signal is incorporated into a fidelity-coupled Voronoi objective with inter-robot repulsion to reduce contention before it emerges. It also drives a risk-aware adaptive arbitration mechanism between global A* guidance and a reactive reinforcement learning policy, balancing long-range efficiency with safe interaction in confined spaces. The framework further supports online self-supervised recalibration of the fidelity model using pseudo-labels derived from recent progress and safety outcomes, enabling adaptation to non-stationary obstacles without manual risk tuning. We evaluate this capability separately in a dedicated severe-traffic ablation. Extensive experiments in randomized grids and a Gazebo factory scenario show high success rates, shorter path length, lower overlap, and robust collision avoidance. The source code will be made publicly available upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

multi-robot exploration
dynamic environments
task allocation
execution difficulty
navigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

execution fidelity
hybrid learning-planning
multi-robot exploration
adaptive arbitration
self-supervised recalibration
🔎 Similar Papers
No similar papers found.
N
Ning Liu
The University of Western Australia
S
Sen Shen
The Chinese University of Hong Kong
Z
Zheng Li
The University of Western Australia
Sheng Liu
Sheng Liu
KTH Royal Institute of Technology
trustworthy AIfederated learningsecurity and privacyintelligent transportation
D
Dongkun Han
The Chinese University of Hong Kong
Shangke Lyu
Shangke Lyu
Westlake University
Robot controlLearning controlHuman-robot Interaction
T
Thomas Braunl
The University of Western Australia