🤖 AI Summary
To address challenges in dynamic target localization within complex, unknown environments—including poor generalization and high cold-start overhead—this paper proposes a multi-agent deep reinforcement learning framework integrating knowledge transfer. Methodologically, it introduces, for the first time, a meta-learning-driven knowledge transfer module into a multi-agent Proximal Policy Optimization (PPO) architecture, augmented with graph neural network–based communication and uncertainty-aware modeling (entropy regularization and Bayesian reward estimation) to enable rapid cross-scenario policy adaptation. The core contribution is a generalizable, uncertainty-aware collaborative decision-making mechanism. Evaluated in both simulation and real-world drone swarm experiments, the framework achieves a 37% reduction in localization error, a 92.5% task completion rate, and a 4.8× improvement in sample efficiency—significantly enhancing system adaptability and training convergence speed.