🤖 AI Summary
Existing research is hindered by the absence of publicly available, fine-grained, multi-city heterogeneous last-mile delivery datasets. To address this gap, we introduce LaDe—the first large-scale, industrial-grade open dataset for last-mile logistics—comprising complete spatiotemporal trajectories and task-level event sequences (e.g., order pickup, completion, anomalies) for over 10.67 million parcels delivered by 21,000 couriers across multiple cities over six months. LaDe uniquely enables unified modeling at the parcel level, with millisecond-precision event timestamps and cross-city spatiotemporal pattern analysis. Our methodology integrates production log ingestion, structured data cleaning, spatiotemporal encoding, fine-grained event annotation, and cross-city trajectory alignment. We validate LaDe’s utility on three core tasks—estimated time of arrival (ETA) prediction, dynamic dispatch optimization, and anomaly detection—demonstrating consistent baseline improvements. The dataset is publicly released on Hugging Face and has been widely adopted by the research community.
📝 Abstract
Real-world last-mile delivery datasets are crucial for research in logistics, supply chain management, and spatio-temporal data mining. Despite a plethora of algorithms developed to date, no widely accepted, publicly available last-mile delivery dataset exists to support research in this field. In this paper, we introduce exttt{LaDe}, the first publicly available last-mile delivery dataset with millions of packages from the industry. LaDe has three unique characteristics: (1) Large-scale. It involves 10,677k packages of 21k couriers over 6 months of real-world operation. (2) Comprehensive information. It offers original package information, such as its location and time requirements, as well as task-event information, which records when and where the courier is while events such as task-accept and task-finish events happen. (3) Diversity. The dataset includes data from various scenarios, including package pick-up and delivery, and from multiple cities, each with its unique spatio-temporal patterns due to their distinct characteristics such as populations. We verify LaDe on three tasks by running several classical baseline models per task. We believe that the large-scale, comprehensive, diverse feature of LaDe can offer unparalleled opportunities to researchers in the supply chain community, data mining community, and beyond. The dataset homepage is publicly available at https://huggingface.co/datasets/Cainiao-AI/LaDe.