🤖 AI Summary
Existing learning-based approaches for the 3D bin packing problem (3D-BPP) in logistics neglect stability constraints and suffer from poor generalization across diverse container sizes.
Method: We propose a deep reinforcement learning framework enabling unified training over multiple bin dimensions. It introduces a weighted reward function and a height-difference metric to enhance loading plan flatness; incorporates clipped policy gradients and a customized policy drift mechanism to mitigate policy entropy collapse and improve exploration of critical actions; and jointly enforces load-bearing capacity and geometric stability constraints for end-to-end deployable solutions.
Contribution/Results: The framework achieves strong generalization across varying bin sizes, significantly outperforms state-of-the-art baselines in packing utilization, and consistently satisfies practical stability requirements—demonstrating both efficacy and applicability in real-world logistics scenarios.
📝 Abstract
The three-dimensional bin packing problem (3D-BPP) is widely applied in logistics and warehousing. Existing learning-based approaches often neglect practical stability-related constraints and exhibit limitations in generalizing across diverse bin dimensions. To address these limitations, we propose a novel deep reinforcement learning framework, One4Many-StablePacker (O4M-SP). The primary advantage of O4M-SP is its ability to handle various bin dimensions in a single training process while incorporating support and weight constraints common in practice. Our training method introduces two innovative mechanisms. First, it employs a weighted reward function that integrates loading rate and a new height difference metric for packing layouts, promoting improved bin utilization through flatter packing configurations. Second, it combines clipped policy gradient optimization with a tailored policy drifting method to mitigate policy entropy collapse, encouraging exploration at critical decision nodes during packing to avoid suboptimal solutions. Extensive experiments demonstrate that O4M-SP generalizes successfully across diverse bin dimensions and significantly outperforms baseline methods. Furthermore, O4M-SP exhibits strong practical applicability by effectively addressing packing scenarios with stability constraints.