🤖 AI Summary
In automotive paint shops, frequent color changes caused by unstructured vehicle sequencing significantly increase operational costs and waste.
Method: This paper addresses vehicle resequencing optimization under multi-lane FIFO buffers. We propose a deep reinforcement learning framework based on Proximal Policy Optimization (PPO), and—crucially—first prove the optimality of the greedy retrieval policy under fully flexible buffer configurations. Leveraging this theoretical insight, we design an action-masking mechanism integrated into the training process. Our approach synergistically combines combinatorial optimization modeling with large-scale stochastic experiments (170 instances, 2–8 lanes, 5–15 colors).
Contribution/Results: The method substantially reduces color change counts, with performance improving as problem scale increases. It exhibits strong robustness to variations in buffer capacity and imbalanced color distributions, overcoming key limitations of conventional heuristics and simplified modeling approaches.
📝 Abstract
In the paint shop problem, an unordered incoming sequence of cars assigned to different colors has to be reshuffled with the objective of minimizing the number of color changes. To reshuffle the incoming sequence, manufacturers can employ a first-in-first-out multi-lane buffer system allowing store and retrieve operations. So far, prior studies primarily focused on simple decision heuristics like greedy or simplified problem variants that do not allow full flexibility when performing store and retrieve operations. In this study, we propose a reinforcement learning approach to minimize color changes for the flexible problem variant, where store and retrieve operations can be performed in an arbitrary order. After proving that greedy retrieval is optimal, we incorporate this finding into the model using action masking. Our evaluation, based on 170 problem instances with 2-8 buffer lanes and 5-15 colors, shows that our approach reduces color changes compared to existing methods by considerable margins depending on the problem size. Furthermore, we demonstrate the robustness of our approach towards different buffer sizes and imbalanced color distributions.