X-Imitator: Spatial-Aware Imitation Learning via Bidirectional Action-Pose Interaction

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the limitations of existing robotic manipulation approaches, which often decouple or unidirectionally process spatial perception and action generation, hindering effective coordination. To overcome this, we propose X-Imitator, a dual-path framework that introduces a novel bidirectional recurrent coupling mechanism between actions and poses. By employing bidirectional conditional modeling, our method enables continuous mutual refinement between spatial reasoning and motion generation, emulating the human internal forward model. The architecture is modular, allowing seamless integration into diverse visuomotor policies and supporting end-to-end training. Evaluated across 24 simulated tasks and 3 real-world scenarios, X-Imitator significantly outperforms current baselines and explicit pose-guided methods.

📝 Abstract

Effectively handling the interplay between spatial perception and action generation remains a critical bottleneck in robotic manipulation. Existing methods typically treat spatial perception and action execution as decoupled or strictly unidirectional processes, fundamentally restricting a robot's ability to master complex manipulation tasks. To address this, we propose X-Imitator, a versatile dual-path framework that models spatial perception and action execution as a tightly coupled bidirectional loop. By reciprocally conditioning current pose predictions on past actions and vice versa, this framework enables continuous mutual refinement between spatial reasoning and action generation. This joint modeling exactly mimics human internal forward models. Designed as a modular architecture, the system can be seamlessly integrated into various visuomotor policies. Extensive experiments across 24 simulated and 3 real-world tasks demonstrate that our framework significantly outperforms both vanilla policies and prior methods utilizing explicit pose guidance. The code will be open sourced.

Problem

Research questions and friction points this paper is trying to address.

spatial perception

action generation

imitation learning

robotic manipulation

bidirectional interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

bidirectional action-pose interaction

spatial-aware imitation learning

modular visuomotor policy