Multi-Fidelity Reinforcement Learning for Time-Optimal Quadrotor Re-planning

📅 2024-03-13

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

216K/year

🤖 AI Summary

High-speed UAV online trajectory planning faces a fundamental trade-off between dynamical model fidelity and real-time computational efficiency. To address this, we propose a multi-fidelity reinforcement learning (RL) framework that jointly trains a policy network and a Bayesian reward estimator—marking the first such integration—and incorporates real-world flight data in closed-loop RL training to achieve precise simulation-to-reality reward alignment. Our method synergistically combines multi-fidelity modeling, real-time dynamical constraint embedding, and hardware-in-the-loop optimization. Experimental results demonstrate that our approach generates trajectories superior to the Snap minimum-snap baseline in both optimality and robustness. Average replanning latency is merely 2 ms—orders of magnitude faster than the baseline’s several minutes. The method is rigorously validated in both high-fidelity simulation and on physical quadrotor platforms, confirming its efficacy and generalizability across domains.

Technology Category

Application Category

📝 Abstract

High-speed online trajectory planning for UAVs poses a significant challenge due to the need for precise modeling of complex dynamics while also being constrained by computational limitations. This paper presents a multi-fidelity reinforcement learning method (MFRL) that aims to effectively create a realistic dynamics model and simultaneously train a planning policy that can be readily deployed in real-time applications. The proposed method involves the co-training of a planning policy and a reward estimator; the latter predicts the performance of the policy's output and is trained efficiently through multi-fidelity Bayesian optimization. This optimization approach models the correlation between different fidelity levels, thereby constructing a high-fidelity model based on a low-fidelity foundation, which enables the accurate development of the reward model with limited high-fidelity experiments. The framework is further extended to include real-world flight experiments in reinforcement learning training, allowing the reward model to precisely reflect real-world constraints and broadening the policy's applicability to real-world scenarios. We present rigorous evaluations by training and testing the planning policy in both simulated and real-world environments. The resulting trained policy not only generates faster and more reliable trajectories compared to the baseline snap minimization method, but it also achieves trajectory updates in 2 ms on average, while the baseline method takes several minutes.

Problem

Research questions and friction points this paper is trying to address.

Develop real-time trajectory planning for UAVs under computational constraints

Train planning policy and reward estimator using multi-fidelity reinforcement learning

Enable fast, reliable trajectory updates for real-world UAV applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-fidelity reinforcement learning for UAV planning

Co-training policy and reward estimator efficiently

Real-world flight experiments in RL training

🔎 Similar Papers

A Reinforcement Learning Based Motion Planner for Quadrotor Autonomous Flight in Dense Environment