🤖 AI Summary
To address resource misallocation and delivery delays in last-mile logistics caused by inaccurate parcel volume forecasting at distribution stations, this paper proposes a closed-loop “Prediction–Decision–Feedback–Explanation” framework. It employs LightGBM for high-accuracy demand forecasting; designs a context-aware reinforcement learning model based on Proximal Policy Optimization (PPO), incorporating an asymmetric reward mechanism to optimize dynamic buffer allocation; introduces a novel generative explainability module that integrates SHAP-based feature attribution with large language models to enable policy traceability and human-AI collaboration; and incorporates Monte Carlo feedback for online, adaptive policy updating. Evaluated across 400+ real-world stations, the framework reduces Weighted Absolute Percentage Error (WAPE) by 21.65%, significantly mitigates under-buffering incidents, and enhances operational transparency and decision responsiveness.
📝 Abstract
Accurate forecasting of package volumes at delivery stations is critical for last-mile logistics, where errors lead to inefficient resource allocation, higher costs, and delivery delays. We propose OpComm, a forecasting and decision-support framework that combines supervised learning with reinforcement learning-based buffer control and a generative AI-driven communication module. A LightGBM regression model generates station-level demand forecasts, which serve as context for a Proximal Policy Optimization (PPO) agent that selects buffer levels from a discrete action set. The reward function penalizes under-buffering more heavily than over-buffering, reflecting real-world trade-offs between unmet demand risks and resource inefficiency. Station outcomes are fed back through a Monte Carlo update mechanism, enabling continual policy adaptation. To enhance interpretability, a generative AI layer produces executive-level summaries and scenario analyses grounded in SHAP-based feature attributions. Across 400+ stations, OpComm reduced Weighted Absolute Percentage Error (WAPE) by 21.65% compared to manual forecasts, while lowering under-buffering incidents and improving transparency for decision-makers. This work shows how contextual reinforcement learning, coupled with predictive modeling, can address operational forecasting challenges and bridge statistical rigor with practical decision-making in high-stakes logistics environments.