π€ AI Summary
This work addresses the need for low-cost, robust visual perception of pallets and their apertures in industrial warehouse environments. Method: We propose a lightweight monocular vision framework integrating YOLOv8 and an enhanced YOLOv11 architecture, augmented with Optuna-driven hyperparameter optimization and a novel spatial association mapping module for pallet apertures. The model is trained and validated on a custom multi-scene warehouse dataset comprising real-world images. Contributions/Results: (1) We introduce an end-to-end geometric consistency post-processing mechanism that jointly enforces spatial coherence between aperture locations and pallet geometry, significantly improving localization accuracy and structured output reliability; (2) We empirically demonstrate YOLOv11βs superior convergence stability and mAP performance over baseline models, achieving an optimal trade-off between accuracy and deployment efficiency. Experiments confirm the systemβs effectiveness in enabling semi-autonomous forklift operations, underscoring its scalability and practical engineering value.
π Abstract
The automation of material handling in warehouses increasingly relies on robust, low cost perception systems for forklifts and Automated Guided Vehicles (AGVs). This work presents a vision based framework for pallet and pallet hole detection and mapping using a single standard camera. We utilized YOLOv8 and YOLOv11 architectures, enhanced through Optuna driven hyperparameter optimization and spatial post processing. An innovative pallet hole mapping module converts the detections into actionable spatial representations, enabling accurate pallet and pallet hole association for forklift operation. Experiments on a custom dataset augmented with real warehouse imagery show that YOLOv8 achieves high pallet and pallet hole detection accuracy, while YOLOv11, particularly under optimized configurations, offers superior precision and stable convergence. The results demonstrate the feasibility of a cost effective, retrofittable visual perception module for forklifts. This study proposes a scalable approach to advancing warehouse automation, promoting safer, economical, and intelligent logistics operations.