ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Addressing the challenges of modeling multi-body spatiotemporal dynamics and weak master-slave arm coordination in multitask bimanual robotic manipulation, this paper proposes the Hierarchical Gaussian World Model (HGWM). HGWM introduces a novel task-oriented Gaussian lattice generation mechanism and a leader-follower architecture that explicitly decouples the dynamics of the stabilizing (leader) and manipulating (follower) arms, enabling precise modeling of bimanual interaction. The method integrates Gaussian visual representation, hierarchical world modeling, future scene prediction, and multi-body dynamical constraints. Evaluated on ten simulated tasks, HGWM achieves an average performance improvement of 20.2% over baselines including ManiGaussian. On nine complex real-world bimanual tasks, it attains a 60% average success rate—demonstrating substantial gains in multitask generalization and control accuracy.

Technology Category

Application Category

📝 Abstract

Multi-task robotic bimanual manipulation is becoming increasingly popular as it enables sophisticated tasks that require diverse dual-arm collaboration patterns. Compared to unimanual manipulation, bimanual tasks pose challenges to understanding the multi-body spatiotemporal dynamics. An existing method ManiGaussian pioneers encoding the spatiotemporal dynamics into the visual representation via Gaussian world model for single-arm settings, which ignores the interaction of multiple embodiments for dual-arm systems with significant performance drop. In this paper, we propose ManiGaussian++, an extension of ManiGaussian framework that improves multi-task bimanual manipulation by digesting multi-body scene dynamics through a hierarchical Gaussian world model. To be specific, we first generate task-oriented Gaussian Splatting from intermediate visual features, which aims to differentiate acting and stabilizing arms for multi-body spatiotemporal dynamics modeling. We then build a hierarchical Gaussian world model with the leader-follower architecture, where the multi-body spatiotemporal dynamics is mined for intermediate visual representation via future scene prediction. The leader predicts Gaussian Splatting deformation caused by motions of the stabilizing arm, through which the follower generates the physical consequences resulted from the movement of the acting arm. As a result, our method significantly outperforms the current state-of-the-art bimanual manipulation techniques by an improvement of 20.2% in 10 simulated tasks, and achieves 60% success rate on average in 9 challenging real-world tasks. Our code is available at https://github.com/April-Yz/ManiGaussian_Bimanual.

Problem

Research questions and friction points this paper is trying to address.

Improves bimanual robotic manipulation via hierarchical Gaussian model

Models multi-body spatiotemporal dynamics for dual-arm systems

Enhances task performance by predicting leader-follower arm interactions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Gaussian world model for dynamics

Task-oriented Gaussian Splatting differentiation

Leader-follower architecture for prediction

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

2024-04-28arXiv.orgCitations: 15

DexSim2Real2: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation

2024-09-13arXiv.orgCitations: 0