🤖 AI Summary
AI research in supply chain management is hindered by the scarcity of high-quality, publicly available datasets—particularly for delivery delay prediction—where existing data are often proprietary, small-scale, or inconsistently maintained, undermining reproducibility and standardized benchmarking. Method: We introduce the first open-source synthetic dataset specifically designed for delivery delay prediction. It is generated using a generative model trained on real logistics data and refined via statistical calibration to faithfully preserve authentic delivery patterns and distributional properties, while rigorously ensuring privacy preservation. Contribution/Results: We publicly release the dataset, generation code, benchmark models, and standardized evaluation metrics via the Supply Chain Data Hub. This establishes an initial performance baseline, fosters reproducible research, and advances the development of standardized evaluation frameworks for delivery delay prediction.
📝 Abstract
Artificial intelligence (AI) is transforming supply chain management, yet progress in predictive tasks -- such as delivery delay prediction -- remains constrained by the scarcity of high-quality, openly available datasets. Existing datasets are often proprietary, small, or inconsistently maintained, hindering reproducibility and benchmarking. We present SynDelay, a synthetic dataset designed for delivery delay prediction. Generated using an advanced generative model trained on real-world data, SynDelay preserves realistic delivery patterns while ensuring privacy. Although not entirely free of noise or inconsistencies, it provides a challenging and practical testbed for advancing predictive modelling. To support adoption, we provide baseline results and evaluation metrics as initial benchmarks, serving as reference points rather than state-of-the-art claims. SynDelay is publicly available through the Supply Chain Data Hub, an open initiative promoting dataset sharing and benchmarking in supply chain AI. We encourage the community to contribute datasets, models, and evaluation practices to advance research in this area. All code is openly accessible at https://supplychaindatahub.org.