Coordinated Multi-Drone Last-mile Delivery: Learning Strategies for Energy-aware and Timely Operations

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the energy-constrained and time-sensitive multi-UAV last-mile delivery problem. We propose a multi-agent reinforcement learning framework based on the Actor-Critic architecture to jointly optimize base station deployment, flight zone partitioning, and dynamic path planning. Our approach innovatively integrates K-means clustering for pre-grouping, multi-agent cooperative decision-making, and adaptive path-policy selection, enabling joint long-term optimization of energy efficiency and latency. The framework is trained and validated on real-world logistics data. Experimental results demonstrate that, compared to baseline methods, our solution reduces average delivery latency by 23.6% and energy consumption by 18.4%. Moreover, the learned base station deployment strategy exhibits strong transferability and practical deployability. This work establishes a novel paradigm for intelligent, cooperative UAV scheduling in sustainable urban logistics systems.

Technology Category

Application Category

📝 Abstract

Drones have recently emerged as a faster, safer, and cost-efficient way for last-mile deliveries of parcels, particularly for urgent medical deliveries highlighted during the pandemic. This paper addresses a new challenge of multi-parcel delivery with a swarm of energy-aware drones, accounting for time-sensitive customer requirements. Each drone plans an optimal multi-parcel route within its battery-restricted flight range to minimize delivery delays and reduce energy consumption. The problem is tackled by decomposing it into three sub-problems: (1) optimizing depot locations and service areas using K-means clustering; (2) determining the optimal flight range for drones through reinforcement learning; and (3) planning and selecting multi-parcel delivery routes via a new optimized plan selection approach. To integrate these solutions and enhance long-term efficiency, we propose a novel algorithm leveraging actor-critic-based multi-agent deep reinforcement learning. Extensive experimentation using realistic delivery datasets demonstrate an exceptional performance of the proposed algorithm. We provide new insights into economic efficiency (minimize energy consumption), rapid operations (reduce delivery delays and overall execution time), and strategic guidance on depot deployment for practical logistics applications.

Problem

Research questions and friction points this paper is trying to address.

Optimizing multi-drone delivery routes under battery constraints

Minimizing delivery delays for time-sensitive parcel operations

Reducing energy consumption in coordinated drone swarm logistics

Innovation

Methods, ideas, or system contributions that make the work stand out.

K-means clustering optimizes depot locations

Reinforcement learning determines optimal flight range

Actor-critic multi-agent deep reinforcement learning

🔎 Similar Papers

Strategic Coordination of Drones via Short-term Distributed Optimization and Long-term Reinforcement Learning

2023-11-16Citations: 1

Bosch Group

Renningen, BW, DE

Applied Science Manager, Prime Air

Amazon

Seattle, Washington, USA

Authors to Follow