One4Many-StablePacker: An Efficient Deep Reinforcement Learning Framework for the 3D Bin Packing Problem

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing learning-based approaches for the 3D bin packing problem (3D-BPP) in logistics neglect stability constraints and suffer from poor generalization across diverse container sizes. Method: We propose a deep reinforcement learning framework enabling unified training over multiple bin dimensions. It introduces a weighted reward function and a height-difference metric to enhance loading plan flatness; incorporates clipped policy gradients and a customized policy drift mechanism to mitigate policy entropy collapse and improve exploration of critical actions; and jointly enforces load-bearing capacity and geometric stability constraints for end-to-end deployable solutions. Contribution/Results: The framework achieves strong generalization across varying bin sizes, significantly outperforms state-of-the-art baselines in packing utilization, and consistently satisfies practical stability requirements—demonstrating both efficacy and applicability in real-world logistics scenarios.

Technology Category

Application Category

📝 Abstract
The three-dimensional bin packing problem (3D-BPP) is widely applied in logistics and warehousing. Existing learning-based approaches often neglect practical stability-related constraints and exhibit limitations in generalizing across diverse bin dimensions. To address these limitations, we propose a novel deep reinforcement learning framework, One4Many-StablePacker (O4M-SP). The primary advantage of O4M-SP is its ability to handle various bin dimensions in a single training process while incorporating support and weight constraints common in practice. Our training method introduces two innovative mechanisms. First, it employs a weighted reward function that integrates loading rate and a new height difference metric for packing layouts, promoting improved bin utilization through flatter packing configurations. Second, it combines clipped policy gradient optimization with a tailored policy drifting method to mitigate policy entropy collapse, encouraging exploration at critical decision nodes during packing to avoid suboptimal solutions. Extensive experiments demonstrate that O4M-SP generalizes successfully across diverse bin dimensions and significantly outperforms baseline methods. Furthermore, O4M-SP exhibits strong practical applicability by effectively addressing packing scenarios with stability constraints.
Problem

Research questions and friction points this paper is trying to address.

Addressing 3D bin packing with stability and weight constraints
Generalizing across diverse bin dimensions in single training
Improving bin utilization through flatter packing configurations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Handles various bin dimensions in single training
Uses weighted reward with height difference metric
Combines clipped gradient with policy drifting method
🔎 Similar Papers
No similar papers found.
L
Lei Gao
S.F. Technology Co., Ltd., 518054, Shenzhen, China
Shihong Huang
Shihong Huang
Professor of Information Systems, Carnegie Mellon University
Software EngineeringBrain Computer InteractionHuman Computer InteractionSelf-adaptive Systems
Shengjie Wang
Shengjie Wang
Tsinghua University
RoboticsReinforcement learningBionic robotics
H
Hong Ma
Polytechnic Institute, Zhejiang University, 310015, Hangzhou, China
F
Feng Zhang
S.F. Technology Co., Ltd., 518054, Shenzhen, China
H
Hengda Bao
S.F. Technology Co., Ltd., 518054, Shenzhen, China
Q
Qichang Chen
S.F. Technology Co., Ltd., 518054, Shenzhen, China
Weihua Zhou
Weihua Zhou
Michigan Technological University
Medical Imaging and Informatics