Multi-UAV Formation Control with Static and Dynamic Obstacle Avoidance via Reinforcement Learning

📅 2024-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of simultaneously maintaining formation and avoiding static/dynamic obstacles during coordinated multi-UAV navigation, this paper proposes a two-stage reinforcement learning framework. In Stage I, a random search procedure automatically balances a multi-objective reward function. In Stage II, a curriculum learning strategy is integrated with an attention-based observation encoder to enable zero-shot policy transfer and adaptive navigation in high-density obstacle environments. The framework effectively mitigates three key challenges: large policy search space, multi-objective optimization, and sim-to-real transfer. Experiments demonstrate that our method achieves significantly higher collision-free rates and formation-keeping accuracy than both classical planning-based approaches and state-of-the-art RL baselines, in both simulation and real-world deployments. Ablation studies validate the critical contributions of both the curriculum learning component and the attention mechanism.

Technology Category

Application Category

📝 Abstract
This paper tackles the challenging task of maintaining formation among multiple unmanned aerial vehicles (UAVs) while avoiding both static and dynamic obstacles during directed flight. The complexity of the task arises from its multi-objective nature, the large exploration space, and the sim-to-real gap. To address these challenges, we propose a two-stage reinforcement learning (RL) pipeline. In the first stage, we randomly search for a reward function that balances key objectives: directed flight, obstacle avoidance, formation maintenance, and zero-shot policy deployment. The second stage applies this reward function to more complex scenarios and utilizes curriculum learning to accelerate policy training. Additionally, we incorporate an attention-based observation encoder to improve formation maintenance and adaptability to varying obstacle densities. Experimental results in both simulation and real-world environments demonstrate that our method outperforms both planning-based and RL-based baselines in terms of collision-free rates and formation maintenance across static, dynamic, and mixed obstacle scenarios. Ablation studies further confirm the effectiveness of our curriculum learning strategy and attention-based encoder. Animated demonstrations are available at: https://sites.google.com/view/ uav-formation-with-avoidance/.
Problem

Research questions and friction points this paper is trying to address.

Maintain UAV formation while avoiding obstacles
Address multi-objective, large exploration space challenges
Improve collision-free rates and formation maintenance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage reinforcement learning pipeline
Attention-based observation encoder
Curriculum learning for policy training
Y
Yuqing Xie
Tsinghua University, Beijing, 100084, China
C
Chao Yu
Tsinghua University, Beijing, 100084, China
H
Hongzhi Zang
Tsinghua University, Beijing, 100084, China
F
Feng Gao
Tsinghua University, Beijing, 100084, China
W
Wenhao Tang
Tsinghua University, Beijing, 100084, China
J
Jingyi Huang
J
Jiayu Chen
Tsinghua University, Beijing, 100084, China
Botian Xu
Botian Xu
Tsinghua University
reinforcement learningrobotics
Y
Yi Wu
Tsinghua University, Beijing, 100084, China; Shanghai Qizhi Institute
Y
Yu Wang
Tsinghua University, Beijing, 100084, China