🤖 AI Summary
This work addresses the instability in industrial robotic tasks caused by observation-action gaps arising from perception, reasoning, and control delays during visual motor policy execution. The authors propose a delay-aware framework that coordinates asynchronous inference and execution without altering the underlying policy architecture. By integrating calibrated multimodal perception, temporally consistent synchronization, and a unified communication pipeline, the framework enables robust coordination under latency. Its core innovation lies in introducing temporal feasibility constraints to perform delay-aware scheduling over finite-horizon action sequences—marking the first systematic, explicit handling of visual motor policy delays at the system level in industrial robotics. Evaluated on contact-intensive assembly tasks, the method consistently achieves smooth motion, compliant interaction, and stable task progress across varying delay conditions, significantly outperforming both blocking and naive asynchronous baselines.
📝 Abstract
Industrial robots are increasingly deployed in contact-rich construction and manufacturing tasks that involve uncertainty and long-horizon execution. While learning-based visuomotor policies offer a promising alternative to open-loop control, their deployment on industrial platforms is challenged by a large observation-execution gap caused by sensing, inference, and control latency. This gap is significantly greater than on low-latency research robots due to high-level interfaces and slower closed-loop dynamics, making execution timing a critical system-level issue. This paper presents a latency-aware framework for deploying and evaluating visuomotor policies on industrial robotic arms under realistic timing constraints. The framework integrates calibrated multimodal sensing, temporally consistent synchronization, a unified communication pipeline, and a teleoperation interface for demonstration collection. Within this framework, we introduce a latency-aware execution strategy that schedules finite-horizon, policy-predicted action sequences based on temporal feasibility, enabling asynchronous inference and execution without modifying policy architectures or training. We evaluate the framework on a contact-rich industrial assembly task while systematically varying inference latency. Using identical policies and sensing pipelines, we compare latency-aware execution with blocking and naive asynchronous baselines. Results show that latency-aware execution maintains smooth motion, compliant contact behavior, and consistent task progression across a wide range of latencies while reducing idle time and avoiding instability observed in baseline methods. These findings highlight the importance of explicitly handling latency for reliable closed-loop deployment of visuomotor policies on industrial robots.