Curriculum RL meets Monte Carlo Planning: Optimization of a Real World Container Management Problem

📅 2025-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the safety-efficiency trade-off in container management at waste sorting facilities—exacerbated by high-delay rewards, sparse critical events, and high-dimensional uncertainty—this paper proposes a curriculum-learning-enhanced Proximal Policy Optimization (PPO) framework, integrating offline pairwise collision modeling with Monte Carlo safety planning. During inference, the method achieves proactive collision avoidance with minimal computational overhead, overcoming the generalization bottleneck of conventional reinforcement learning under strict safety constraints. Experiments demonstrate a significant reduction in safety-limit violations, improved collision avoidance capability, sustained high throughput, and strong scalability across varying container-to-processing-unit ratios. The core innovation lies in the synergistic integration of a curriculum-driven policy training paradigm with a lightweight offline collision model, enabling robust, sample-efficient, and safety-aware decision-making.

Technology Category

Application Category

📝 Abstract
In this work, we augment reinforcement learning with an inference-time collision model to ensure safe and efficient container management in a waste-sorting facility with limited processing capacity. Each container has two optimal emptying volumes that trade off higher throughput against overflow risk. Conventional reinforcement learning (RL) approaches struggle under delayed rewards, sparse critical events, and high-dimensional uncertainty -- failing to consistently balance higher-volume empties with the risk of safety-limit violations. To address these challenges, we propose a hybrid method comprising: (1) a curriculum-learning pipeline that incrementally trains a PPO agent to handle delayed rewards and class imbalance, and (2) an offline pairwise collision model used at inference time to proactively avert collisions with minimal online cost. Experimental results show that our targeted inference-time collision checks significantly improve collision avoidance, reduce safety-limit violations, maintain high throughput, and scale effectively across varying container-to-PU ratios. These findings offer actionable guidelines for designing safe and efficient container-management systems in real-world facilities.
Problem

Research questions and friction points this paper is trying to address.

Optimize container management in waste-sorting facilities with limited capacity
Balance throughput and overflow risk using reinforcement learning and collision models
Address delayed rewards, sparse events, and high-dimensional uncertainty in RL
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum-learning pipeline for delayed rewards
Inference-time collision model for safety
Hybrid PPO agent for container management
🔎 Similar Papers
No similar papers found.