FreshRetailNet-50K: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In retail, demand for perishable goods is often censored due to stockouts, leading to systematically biased (underestimated) observed sales data and undermining inventory and pricing decisions. To address this, we introduce the first large-scale, manually annotated fresh-food stockout dataset—comprising 50,000 hourly store-SKU time series across 863 SKUs—with precise stockout labels and rich contextual features (e.g., promotions, weather). We propose a two-stage modeling framework: (1) hourly stockout state classification, followed by (2) latent demand reconstruction. Our method is the first to leverage fine-grained stockout annotations for robust latent demand estimation, integrating multivariate time-series modeling with uncertainty-aware forecasting. Experiments demonstrate a 2.73% improvement in demand forecasting accuracy and reduce estimation bias from 7.37% to nearly zero, substantially enhancing inventory decision reliability. This work establishes a causal, data-driven paradigm for intelligent operations of perishable categories.

Technology Category

Application Category

📝 Abstract
Accurate demand estimation is critical for the retail business in guiding the inventory and pricing policies of perishable products. However, it faces fundamental challenges from censored sales data during stockouts, where unobserved demand creates systemic policy biases. Existing datasets lack the temporal resolution and annotations needed to address this censoring effect. To fill this gap, we present FreshRetailNet-50K, the first large-scale benchmark for censored demand estimation. It comprises 50,000 store-product time series of detailed hourly sales data from 898 stores in 18 major cities, encompassing 863 perishable SKUs meticulously annotated for stockout events. The hourly stock status records unique to this dataset, combined with rich contextual covariates, including promotional discounts, precipitation, and temporal features, enable innovative research beyond existing solutions. We demonstrate one such use case of two-stage demand modeling: first, we reconstruct the latent demand during stockouts using precise hourly annotations. We then leverage the recovered demand to train robust demand forecasting models in the second stage. Experimental results show that this approach achieves a 2.73% improvement in prediction accuracy while reducing the systematic demand underestimation from 7.37% to near-zero bias. With unprecedented temporal granularity and comprehensive real-world information, FreshRetailNet-50K opens new research directions in demand imputation, perishable inventory optimization, and causal retail analytics. The unique annotation quality and scale of the dataset address long-standing limitations in retail AI, providing immediate solutions and a platform for future methodological innovation. The data (https://huggingface.co/datasets/Dingdong-Inc/FreshRetailNet-50K) and code (https://github.com/Dingdong-Inc/frn-50k-baseline}) are openly released.
Problem

Research questions and friction points this paper is trying to address.

Estimating latent demand during stockouts in retail
Addressing censored sales data challenges in demand forecasting
Providing annotated dataset for perishable inventory optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hourly stock status records for latent demand recovery
Two-stage demand modeling with precise annotations
Large-scale dataset with rich contextual covariates
🔎 Similar Papers
No similar papers found.