🤖 AI Summary
This study addresses the challenge of task planning and allocation in human–robot collaborative manufacturing under complex, dynamic environments by proposing a spatial-aware hierarchical reinforcement learning approach. At the high level, an experience buffer–based deep Q-learning (EBQ) method is employed to mitigate sparse reward issues and accelerate training convergence. At the low level, spatial-aware task allocation (SAP) integrates real-time human positions and movement distances through path planning to enable context-sensitive assignment decisions. Experimental evaluations in a 3D simulation environment demonstrate that the proposed EBQ-SAP framework effectively handles dynamic disturbances and significantly improves both task allocation efficiency and overall production performance.
📝 Abstract
In advanced manufacturing systems, humans and robots collaborate to conduct the production process. Effective task planning and allocation (TPA) is crucial for achieving high production efficiency, yet it remains challenging in complex and dynamic manufacturing environments. The dynamic nature of humans and robots, particularly the need to consider spatial information (e.g., humans' real-time position and the distance they need to move to complete a task), substantially complicates TPA. To address the above challenges, we decompose production tasks into manageable subtasks. We then implement a real-time hierarchical human-robot TPA algorithm, including a high-level agent for task planning and a low-level agent for task allocation. For the high-level agent, we propose an efficient buffer-based deep Q-learning method (EBQ), which reduces training time and enhances performance in production problems with long-term and sparse reward challenges. For the low-level agent, a path planning-based spatially aware method (SAP) is designed to allocate tasks to the appropriate human-robot resources, thereby achieving the corresponding sequential subtasks. We conducted experiments on a complex real-time production process in a 3D simulator. The results demonstrate that our proposed EBQ&SAP method effectively addresses human-robot TPA problems in complex and dynamic production processes.