Learning-Based Vision Systems for Semi-Autonomous Forklift Operation in Industrial Warehouse Environments

πŸ“… 2025-11-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the need for low-cost, robust visual perception of pallets and their apertures in industrial warehouse environments. Method: We propose a lightweight monocular vision framework integrating YOLOv8 and an enhanced YOLOv11 architecture, augmented with Optuna-driven hyperparameter optimization and a novel spatial association mapping module for pallet apertures. The model is trained and validated on a custom multi-scene warehouse dataset comprising real-world images. Contributions/Results: (1) We introduce an end-to-end geometric consistency post-processing mechanism that jointly enforces spatial coherence between aperture locations and pallet geometry, significantly improving localization accuracy and structured output reliability; (2) We empirically demonstrate YOLOv11’s superior convergence stability and mAP performance over baseline models, achieving an optimal trade-off between accuracy and deployment efficiency. Experiments confirm the system’s effectiveness in enabling semi-autonomous forklift operations, underscoring its scalability and practical engineering value.

Technology Category

Application Category

πŸ“ Abstract
The automation of material handling in warehouses increasingly relies on robust, low cost perception systems for forklifts and Automated Guided Vehicles (AGVs). This work presents a vision based framework for pallet and pallet hole detection and mapping using a single standard camera. We utilized YOLOv8 and YOLOv11 architectures, enhanced through Optuna driven hyperparameter optimization and spatial post processing. An innovative pallet hole mapping module converts the detections into actionable spatial representations, enabling accurate pallet and pallet hole association for forklift operation. Experiments on a custom dataset augmented with real warehouse imagery show that YOLOv8 achieves high pallet and pallet hole detection accuracy, while YOLOv11, particularly under optimized configurations, offers superior precision and stable convergence. The results demonstrate the feasibility of a cost effective, retrofittable visual perception module for forklifts. This study proposes a scalable approach to advancing warehouse automation, promoting safer, economical, and intelligent logistics operations.
Problem

Research questions and friction points this paper is trying to address.

Developing vision systems for pallet detection in warehouse forklift automation
Creating cost-effective perception modules using single standard cameras
Enabling accurate pallet hole mapping for autonomous forklift operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

YOLOv8 and YOLOv11 for pallet detection
Optuna hyperparameter optimization enhances performance
Pallet hole mapping enables spatial representation
πŸ”Ž Similar Papers
No similar papers found.
V
Vamshika Sutar
Department of Civil Engineering, Indian Institute of Technology Bombay, India
M
Mahek Maheshwari
Department of Civil Engineering, Indian Institute of Technology Bombay, India
Archak Mittal
Archak Mittal
Indian Institute of Technology Bombay
Transportation EngineeringConnected Automated VehiclesTraffic Signals