Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current multimodal large language models (MLLMs) exhibit limited performance in long-horizon task planning for bipedal humanoid robots, primarily due to the absence of a suitable simulation evaluation platform and insufficient modeling of proprioception. To address this, we introduce DualTHOR—a novel simulation environment supporting bimanual coordination and full-body pose modeling—and Proprio-MLLM, a new MLLM architecture that uniquely integrates proprioceptive signals, motion-base positional encoding, and cross-space continuous state-transition mechanisms. This design significantly enhances embodied reasoning about self-configuration and action logic. Experiments demonstrate that Proprio-MLLM achieves an average 19.75% improvement in planning performance on complex bimanual tasks over existing baselines. Our work establishes a scalable perception–planning joint modeling paradigm for embodied agents, advancing high-level planning capabilities in physically grounded AI systems.

Technology Category

Application Category

📝 Abstract
In recent years, Multimodal Large Language Models (MLLMs) have demonstrated the ability to serve as high-level planners, enabling robots to follow complex human instructions. However, their effectiveness, especially in long-horizon tasks involving dual-arm humanoid robots, remains limited. This limitation arises from two main challenges: (i) the absence of simulation platforms that systematically support task evaluation and data collection for humanoid robots, and (ii) the insufficient embodiment awareness of current MLLMs, which hinders reasoning about dual-arm selection logic and body positions during planning. To address these issues, we present DualTHOR, a new dual-arm humanoid simulator, with continuous transition and a contingency mechanism. Building on this platform, we propose Proprio-MLLM, a model that enhances embodiment awareness by incorporating proprioceptive information with motion-based position embedding and a cross-spatial encoder. Experiments show that, while existing MLLMs struggle in this environment, Proprio-MLLM achieves an average improvement of 19.75% in planning performance. Our work provides both an essential simulation platform and an effective model to advance embodied intelligence in humanoid robotics. The code is available at https://anonymous.4open.science/r/DualTHOR-5F3B.
Problem

Research questions and friction points this paper is trying to address.

Developing simulation platform for dual-arm humanoid robot evaluation
Enhancing embodiment awareness in multimodal language models
Improving planning performance for long-horizon dual-arm tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed DualTHOR simulator with contingency mechanism
Created Proprio-MLLM with proprioceptive motion embeddings
Integrated cross-spatial encoder for embodiment awareness
🔎 Similar Papers
No similar papers found.