Multi-Agent Meta-Offline Reinforcement Learning for Timely UAV Path Planning and Data Collection

📅 2025-01-27

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Traditional online multi-agent reinforcement learning (MARL) for collaborative path planning and data collection among multiple UAVs in dynamic wireless networks suffers from excessive communication dependency, poor environmental adaptability, and insufficient safety and generalization capabilities. Method: This paper proposes a meta-learning-enhanced offline MARL framework that uniquely integrates Conservative Q-Learning (CQL) with Model-Agnostic Meta-Learning (MAML), enabling offline training and rapid adaptation to unseen environments. We further introduce two novel paradigms: independently trained MARL (M-I-MARL) and centralized-training-with-decentralized-execution MARL (M-CTDE-MARL). Results: Experimental results demonstrate that the proposed CTDE variant achieves 50% faster convergence in dynamic scenarios, significantly improving scalability, robustness, and cross-configuration transferability of trajectory optimization and scheduling policies.

Technology Category

Application Category

📝 Abstract

Multi-agent reinforcement learning (MARL) has been widely adopted in high-performance computing and complex data-driven decision-making in the wireless domain. However, conventional MARL schemes face many obstacles in real-world scenarios. First, most MARL algorithms are online, which might be unsafe and impractical. Second, MARL algorithms are environment-specific, meaning network configuration changes require model retraining. This letter proposes a novel meta-offline MARL algorithm that combines conservative Q-learning (CQL) and model agnostic meta-learning (MAML). CQL enables offline training by leveraging pre-collected datasets, while MAML ensures scalability and adaptability to dynamic network configurations and objectives. We propose two algorithm variants: independent training (M-I-MARL) and centralized training decentralized execution (M-CTDE-MARL). Simulation results show that the proposed algorithm outperforms conventional schemes, especially the CTDE approach that achieves 50 % faster convergence in dynamic scenarios than the benchmarks. The proposed framework enhances scalability, robustness, and adaptability in wireless communication systems by optimizing UAV trajectories and scheduling policies.

Problem

Research questions and friction points this paper is trying to address.

Multi-Agent Learning

Drone Path Planning

Adaptive Environment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conservative Q-Learning

Meta-Learning

Multi-Agent Learning

🔎 Similar Papers

Multi-Fidelity Reinforcement Learning for Time-Optimal Quadrotor Re-planning