🤖 AI Summary
To address the high computational demands, bandwidth constraints, and sub-millisecond latency requirements of real-time human–machine interaction in industrial metaverses, this paper proposes a task-oriented, edge-cooperative cross-system framework. Methodologically, it decouples digital twins into two functional modules—visual rendering and robotic control—enabling predictive rendering and remote device pre-control. Furthermore, it introduces HITL-MAML, a human-in-the-loop, model-agnostic meta-learning algorithm that dynamically optimizes the behavioral prediction horizon to enhance generalization and response accuracy. Integrated with task-level weight optimization and real-time feedback, the framework achieves a weighted RMSE of 0.0101 m in trajectory drawing tasks. In nuclear facility dismantling 3D reconstruction, it attains PSNR = 22.11, SSIM = 0.8729, and LPIPS = 0.1298, demonstrating dual guarantees of spatial precision and visual fidelity under high-risk operational conditions.
📝 Abstract
Real-time human-device interaction in industrial Metaverse faces challenges such as high computational load, limited bandwidth, and strict latency. This paper proposes a task-oriented edge-assisted cross-system framework using digital twins (DTs) to enable responsive interactions. By predicting operator motions, the system supports: 1) proactive Metaverse rendering for visual feedback, and 2) preemptive control of remote devices. The DTs are decoupled into two virtual functions-visual display and robotic control-optimizing both performance and adaptability. To enhance generalizability, we introduce the Human-In-The-Loop Model-Agnostic Meta-Learning (HITL-MAML) algorithm, which dynamically adjusts prediction horizons. Evaluation on two tasks demonstrates the framework's effectiveness: in a Trajectory-Based Drawing Control task, it reduces weighted RMSE from 0.0712 m to 0.0101 m; in a real-time 3D scene representation task for nuclear decommissioning, it achieves a PSNR of 22.11, SSIM of 0.8729, and LPIPS of 0.1298. These results show the framework's capability to ensure spatial precision and visual fidelity in real-time, high-risk industrial environments.