Learning-Based Vision Systems for Semi-Autonomous Forklift Operation in Industrial Warehouse Environments

📅 2025-11-09

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the need for low-cost, robust visual perception of pallets and their apertures in industrial warehouse environments. Method: We propose a lightweight monocular vision framework integrating YOLOv8 and an enhanced YOLOv11 architecture, augmented with Optuna-driven hyperparameter optimization and a novel spatial association mapping module for pallet apertures. The model is trained and validated on a custom multi-scene warehouse dataset comprising real-world images. Contributions/Results: (1) We introduce an end-to-end geometric consistency post-processing mechanism that jointly enforces spatial coherence between aperture locations and pallet geometry, significantly improving localization accuracy and structured output reliability; (2) We empirically demonstrate YOLOv11’s superior convergence stability and mAP performance over baseline models, achieving an optimal trade-off between accuracy and deployment efficiency. Experiments confirm the system’s effectiveness in enabling semi-autonomous forklift operations, underscoring its scalability and practical engineering value.

Technology Category

Application Category

📝 Abstract

The automation of material handling in warehouses increasingly relies on robust, low cost perception systems for forklifts and Automated Guided Vehicles (AGVs). This work presents a vision based framework for pallet and pallet hole detection and mapping using a single standard camera. We utilized YOLOv8 and YOLOv11 architectures, enhanced through Optuna driven hyperparameter optimization and spatial post processing. An innovative pallet hole mapping module converts the detections into actionable spatial representations, enabling accurate pallet and pallet hole association for forklift operation. Experiments on a custom dataset augmented with real warehouse imagery show that YOLOv8 achieves high pallet and pallet hole detection accuracy, while YOLOv11, particularly under optimized configurations, offers superior precision and stable convergence. The results demonstrate the feasibility of a cost effective, retrofittable visual perception module for forklifts. This study proposes a scalable approach to advancing warehouse automation, promoting safer, economical, and intelligent logistics operations.

Problem

Research questions and friction points this paper is trying to address.

Developing vision systems for pallet detection in warehouse forklift automation

Creating cost-effective perception modules using single standard cameras

Enabling accurate pallet hole mapping for autonomous forklift operations

Innovation

Methods, ideas, or system contributions that make the work stand out.

YOLOv8 and YOLOv11 for pallet detection

Optuna hyperparameter optimization enhances performance

Pallet hole mapping enables spatial representation

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey