SynDelay: A Synthetic Dataset for Delivery Delay Prediction

📅 2025-08-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
AI research in supply chain management is hindered by the scarcity of high-quality, publicly available datasets—particularly for delivery delay prediction—where existing data are often proprietary, small-scale, or inconsistently maintained, undermining reproducibility and standardized benchmarking. Method: We introduce the first open-source synthetic dataset specifically designed for delivery delay prediction. It is generated using a generative model trained on real logistics data and refined via statistical calibration to faithfully preserve authentic delivery patterns and distributional properties, while rigorously ensuring privacy preservation. Contribution/Results: We publicly release the dataset, generation code, benchmark models, and standardized evaluation metrics via the Supply Chain Data Hub. This establishes an initial performance baseline, fosters reproducible research, and advances the development of standardized evaluation frameworks for delivery delay prediction.

Technology Category

Application Category

📝 Abstract
Artificial intelligence (AI) is transforming supply chain management, yet progress in predictive tasks -- such as delivery delay prediction -- remains constrained by the scarcity of high-quality, openly available datasets. Existing datasets are often proprietary, small, or inconsistently maintained, hindering reproducibility and benchmarking. We present SynDelay, a synthetic dataset designed for delivery delay prediction. Generated using an advanced generative model trained on real-world data, SynDelay preserves realistic delivery patterns while ensuring privacy. Although not entirely free of noise or inconsistencies, it provides a challenging and practical testbed for advancing predictive modelling. To support adoption, we provide baseline results and evaluation metrics as initial benchmarks, serving as reference points rather than state-of-the-art claims. SynDelay is publicly available through the Supply Chain Data Hub, an open initiative promoting dataset sharing and benchmarking in supply chain AI. We encourage the community to contribute datasets, models, and evaluation practices to advance research in this area. All code is openly accessible at https://supplychaindatahub.org.
Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of open delivery delay datasets
Providing synthetic data for supply chain prediction
Enabling benchmarking in delivery delay modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic dataset generation using advanced generative models
Privacy-preserving realistic delivery pattern simulation
Open benchmarking with baseline metrics and code
🔎 Similar Papers
No similar papers found.