Meta-Adaptive Beam Search Planning for Transformer-Based Reinforcement Learning Control of UAVs with Overhead Manipulators under Flight Disturbances

📅 2026-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of trajectory deviation in drone-mounted upper-mounted manipulators, where body attitude disturbances hinder high-precision end-effector tracking. To overcome this, the authors propose a reinforcement learning control framework that integrates Transformer and Double Deep Q-Network (DDQN) architectures. The Transformer encodes state sequences to capture temporal dependencies, while DDQN provides stable learning targets. Additionally, a meta-adaptive beam search mechanism leverages a critic network to retrospectively optimize multi-step action sequences within a short planning horizon, enabling model-predictive, software-in-the-loop control. Experimental results demonstrate that the proposed method improves the composite reward by 10.2% over the DDQN baseline, reduces average tracking error from 6% to 3%, achieves a 29.6% gain in combined performance metrics, and enables stable high-precision tracking within a 5 cm error bound.
📝 Abstract
Drones equipped with overhead manipulators offer unique capabilities for inspection, maintenance, and contact-based interaction. However, the motion of the drone and its manipulator is tightly linked, and even small attitude changes caused by wind or control imperfections shift the end-effector away from its intended path. This coupling makes reliable tracking difficult and also limits the direct use of learning-based arm controllers that were originally designed for fixed-base robots. These effects appear consistently in our tests whenever the UAV body experiences drift or rapid attitude corrections. To address this behavior, we develop a reinforcement-learning (RL) framework with a transformer-based double deep Q learning (DDQN), with the core idea of using an adaptive beam-search planner that applies a short-horizon beam search over candidate control sequences using the learned critic as the forward estimator. This allows the controller to anticipate the end-effector's motion through simulated rollouts rather than executing those actions directly on the actual model, realizing a software-in-the-loop (SITL) approach. The lookahead relies on value estimates from a Transformer critic that processes short sequences of states, while a DDQN backbone provides the one-step targets needed to keep the learning process stable. Evaluated on a 3-DoF aerial manipulator under identical training conditions, the proposed meta-adaptive planner shows the strongest overall performance with a 10.2% reward increase, a substantial reduction in mean tracking error (from about 6% to 3%), and a 29.6% improvement in the combined reward-error metric relative to the DDQN baseline. Our method exhibits elevated stability in tracking target tip trajectory (by maintaining 5 cm tracking error) when the drone base exhibits drifts due to external disturbances, as opposed to the fixed-beam and Transformer-only variants.
Problem

Research questions and friction points this paper is trying to address.

aerial manipulator
flight disturbances
end-effector tracking
UAV control
motion coupling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based RL
adaptive beam search
aerial manipulator
software-in-the-loop planning
disturbance-robust control