🤖 AI Summary
This work addresses the challenge of enabling high-speed, safe, and efficient autonomous navigation for unmanned aerial vehicles in complex environments by proposing a one-stage joint regression-and-ranking planning framework based on fixed motion anchors. The method simultaneously predicts optimized terminal states and planning scores for each anchor in a single forward pass and decodes dynamically feasible trajectories. It introduces geometric-aware tokens and a self-attention mechanism to enable global cross-anchor reasoning, employs polar coordinate-based positional encoding to preserve directional structure, and integrates a goal-aware modulation module that injects velocity, acceleration, and target information. Experimental results demonstrate that the approach achieves a 100% task success rate at a maximum speed of 4.0 m/s, maintains a minimum safety distance of 0.7576 m, and reduces average flight time to 27.49 s, significantly outperforming existing methods such as YOPO.
📝 Abstract
Agile unmanned aerial vehicle (UAV) navigation in cluttered environments demands a planning architecture that is both computationally efficient and structurally expressive enough to reason over multiple feasible motions. This paper presents SAGA, a robust self-attention and goal-aware anchor-based planner for safe UAV autonomous navigation. SAGA formulates local planning as a one-stage joint regression-and-ranking problem over a fixed lattice of motion anchors. Given a depth image and a body-frame motion state, the planner predicts refined terminal states and planning scores for all anchors in a single forward pass, after which the best candidate is decoded into a dynamically feasible trajectory. The key idea of SAGA is to transform anchor-aligned features into geometry-aware tokens and perform cross-anchor global reasoning with self-attention. To preserve directional structure in the token space, we further introduce a polar positional encoding derived from anchor yaw and pitch. In addition, a goal-aware modulation module injects velocity, acceleration, and target information into the token representation before final score prediction. Experiments in cluttered pillar-map environments under maximum speed settings of 2.0, 3.0, and 4.0~m/s show that SAGA consistently achieves a 100\% success rate, while YOPO drops from 90.91\% to 62.50\%, Ego-planner from 71.43\% to 52.63\%, and Fast-planner from 52.63\% to 38.46\%. Under the 4.0~m/s maximum speed setting, SAGA also improves average safety from 1.9843~m to 2.3888~m and minimum safety from 0.4390~m to 0.7576~m over YOPO, while reducing total flight time from 40.4631~s to 27.4901~s. The comparison with SAGA w/o PPE further shows that explicit polar positional encoding is critical for stable cross-anchor reasoning and safe passage selection in cluttered scenes.