VisFly-Lab: Unified Differentiable Framework for First-Order Reinforcement Learning of Quadrotor Control

📅 2026-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fragmentation and lack of a unified differentiable framework in existing reinforcement learning approaches for quadrotors operating across multiple tasks. To overcome these limitations, the authors propose a unified differentiable simulation framework capable of supporting four distinct tasks: hovering, trajectory tracking, landing, and racing. They further introduce an amended backpropagation through time (ABPT) algorithm that incorporates differentiable unrolling optimization, value-augmented objectives, and visited-state initialization to mitigate gradient bias arising from insufficient state coverage and non-differentiable rewards. Experimental results demonstrate that ABPT significantly improves performance on tasks with partially non-differentiable rewards while maintaining competitive results in fully differentiable settings, and successfully enables preliminary policy transfer to the real world.

Technology Category

Application Category

📝 Abstract
First-order reinforcement learning with differentiable simulation is promising for quadrotor control, but practical progress remains fragmented across task-specific settings. To support more systematic development and evaluation, we present a unified differentiable framework for multi-task quadrotor control. The framework is wrapped, extensible, and equipped with deployment-oriented dynamics, providing a common interface across four representative tasks: hovering, tracking, landing, and racing. We also present the suite of first-order learning algorithms, where we identify two practical bottlenecks of standard first-order training: limited state coverage caused by horizon initialization and gradient bias caused by partially non-differentiable rewards. To address these issues, we propose Amended Backpropagation Through Time (ABPT), which combines differentiable rollout optimization, a value-based auxiliary objective, and visited-state initialization to improve training robustness. Experimental results show that ABPT yields the clearest gains in tasks with partially non-differentiable rewards, while remaining competitive in fully differentiable settings. We further provide proof-of-concept real-world deployments showing initial transferability of policies learned in the proposed framework beyond simulation.
Problem

Research questions and friction points this paper is trying to address.

quadrotor control
first-order reinforcement learning
differentiable simulation
multi-task learning
gradient bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

differentiable simulation
first-order reinforcement learning
quadrotor control
Amended Backpropagation Through Time
multi-task learning