🤖 AI Summary
Complex heterogeneous fighter aircraft cooperative air combat decision-making poses significant challenges due to dynamic adversarial environments, high-dimensional state-action spaces, and stringent real-time constraints.
Method: This paper proposes a hierarchical multi-agent reinforcement learning (MARL) framework featuring a novel “command–control” two-layer decoupled architecture. Macro-level mission planning and micro-level flight-dynamics-driven maneuver coordination are jointly optimized via policy symmetry constraints and curriculum learning to accelerate convergence and ensure real-time responsiveness.
Contribution/Results: (1) A scalable, interpretable, and deployable intelligent air combat simulation system is developed; (2) High-win-rate tactical policies are generated in a zero-risk, low-cost digital environment; (3) Experiments demonstrate statistically significant improvements in multi-aircraft mission completion rate (+27.4%) and win rate (+31.6%), validating the framework’s effectiveness and generalizability for real-world air defense operations.
📝 Abstract
This work presents a Hierarchical Multi-Agent Reinforcement Learning framework for analyzing simulated air combat scenarios involving heterogeneous agents. The objective is to identify effective Courses of Action that lead to mission success within preset simulations, thereby enabling the exploration of real-world defense scenarios at low cost and in a safe-to-fail setting. Applying deep Reinforcement Learning in this context poses specific challenges, such as complex flight dynamics, the exponential size of the state and action spaces in multi-agent systems, and the capability to integrate real-time control of individual units with look-ahead planning. To address these challenges, the decision-making process is split into two levels of abstraction: low-level policies control individual units, while a high-level commander policy issues macro commands aligned with the overall mission targets. This hierarchical structure facilitates the training process by exploiting policy symmetries of individual agents and by separating control from command tasks. The low-level policies are trained for individual combat control in a curriculum of increasing complexity. The high-level commander is then trained on mission targets given pre-trained control policies. The empirical validation confirms the advantages of the proposed framework.