ManiGaussian++: General Robotic Bimanual Manipulation with Hierarchical Gaussian World Model

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of modeling multi-body spatiotemporal dynamics and weak master-slave arm coordination in multitask bimanual robotic manipulation, this paper proposes the Hierarchical Gaussian World Model (HGWM). HGWM introduces a novel task-oriented Gaussian lattice generation mechanism and a leader-follower architecture that explicitly decouples the dynamics of the stabilizing (leader) and manipulating (follower) arms, enabling precise modeling of bimanual interaction. The method integrates Gaussian visual representation, hierarchical world modeling, future scene prediction, and multi-body dynamical constraints. Evaluated on ten simulated tasks, HGWM achieves an average performance improvement of 20.2% over baselines including ManiGaussian. On nine complex real-world bimanual tasks, it attains a 60% average success rate—demonstrating substantial gains in multitask generalization and control accuracy.

Technology Category

Application Category

📝 Abstract
Multi-task robotic bimanual manipulation is becoming increasingly popular as it enables sophisticated tasks that require diverse dual-arm collaboration patterns. Compared to unimanual manipulation, bimanual tasks pose challenges to understanding the multi-body spatiotemporal dynamics. An existing method ManiGaussian pioneers encoding the spatiotemporal dynamics into the visual representation via Gaussian world model for single-arm settings, which ignores the interaction of multiple embodiments for dual-arm systems with significant performance drop. In this paper, we propose ManiGaussian++, an extension of ManiGaussian framework that improves multi-task bimanual manipulation by digesting multi-body scene dynamics through a hierarchical Gaussian world model. To be specific, we first generate task-oriented Gaussian Splatting from intermediate visual features, which aims to differentiate acting and stabilizing arms for multi-body spatiotemporal dynamics modeling. We then build a hierarchical Gaussian world model with the leader-follower architecture, where the multi-body spatiotemporal dynamics is mined for intermediate visual representation via future scene prediction. The leader predicts Gaussian Splatting deformation caused by motions of the stabilizing arm, through which the follower generates the physical consequences resulted from the movement of the acting arm. As a result, our method significantly outperforms the current state-of-the-art bimanual manipulation techniques by an improvement of 20.2% in 10 simulated tasks, and achieves 60% success rate on average in 9 challenging real-world tasks. Our code is available at https://github.com/April-Yz/ManiGaussian_Bimanual.
Problem

Research questions and friction points this paper is trying to address.

Improves bimanual robotic manipulation via hierarchical Gaussian model
Models multi-body spatiotemporal dynamics for dual-arm systems
Enhances task performance by predicting leader-follower arm interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Gaussian world model for dynamics
Task-oriented Gaussian Splatting differentiation
Leader-follower architecture for prediction
🔎 Similar Papers
No similar papers found.
Tengbo Yu
Tengbo Yu
Tsinghua University
VLAComputer VisionEmbodied AI
Guanxing Lu
Guanxing Lu
Tsinghua University
VLARLRobotics3D Vision
Z
Zaijia Yang
School of Computer Science and Technology, Hainan University
Haoyuan Deng
Haoyuan Deng
Nanyang Technological University
RoboticsImitation LearningReinforcement Learning
S
Season Si Chen
Tsinghua Shenzhen International Graduate School, Tsinghua University
J
Jiwen Lu
Department of Automation, Tsinghua University
Wenbo Ding
Wenbo Ding
UNIVERSITY AT BUFFALO
securityMachine Learning
Guoqiang Hu
Guoqiang Hu
Professor, Nanyang Technological University, Singapore
Optimization and controlAIRobotics
Y
Yansong Tang
Tsinghua Shenzhen International Graduate School, Tsinghua University
Z
Ziwei Wang
School of Electrical and Electronic Engineering, Nanyang Technological University