🤖 AI Summary
To address the scalability limitations of human-in-the-loop deep reinforcement learning (HITL DRL) in complex real-world decision-making tasks—such as UAV adversarial defense—this paper proposes a hierarchical HITL DRL framework that unifies three modalities of human input: reward shaping, action correction, and expert demonstrations, while synergistically integrating self-play, imitation learning, and transfer learning. Its key contributions include: (i) the first hierarchical human-AI collaboration architecture; (ii) a theoretical analysis revealing the nonlinear trade-off between human input volume and training efficiency; and (iii) a dynamic collaboration paradigm tailored to sophisticated scenarios like decoy attacks. Evaluated on the Cogment platform using a multi-UAV simulation for adversarial interception, the framework achieves over 40% faster convergence, significantly improved policy stability and final performance. Empirical results further demonstrate that human guidance effectively reduces gradient variance, mitigating both overfitting and undertraining.
📝 Abstract
With the growing popularity of deep reinforcement learning (DRL), human-in-the-loop (HITL) approach has the potential to revolutionize the way we approach decision-making problems and create new opportunities for human-AI collaboration. In this article, we introduce a novel multi-layered hierarchical HITL DRL algorithm that comprises three types of learning: self learning, imitation learning and transfer learning. In addition, we consider three forms of human inputs: reward, action and demonstration. Furthermore, we discuss main challenges, trade-offs and advantages of HITL in solving complex problems and how human information can be integrated in the AI solution systematically. To verify our technical results, we present a real-world unmanned aerial vehicles (UAV) problem wherein a number of enemy drones attack a restricted area. The objective is to design a scalable HITL DRL algorithm for ally drones to neutralize the enemy drones before they reach the area. To this end, we first implement our solution using an award-winning open-source HITL software called Cogment. We then demonstrate several interesting results such as (a) HITL leads to faster training and higher performance, (b) advice acts as a guiding direction for gradient methods and lowers variance, and (c) the amount of advice should neither be too large nor too small to avoid over-training and under-training. Finally, we illustrate the role of human-AI cooperation in solving two real-world complex scenarios, i.e., overloaded and decoy attacks.