HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation

📅 2025-01-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address critical bottlenecks in bimanual robot manipulation—namely, scarcity and annotation difficulty of hand–object interaction data, severe occlusions, and limited viewpoint diversity—this paper introduces the first 3D Gaussian Splatting (3DGS)-based data augmentation framework tailored for bimanual hand–object interaction. Methodologically, we propose a novel mesh-driven joint 3DGS modeling and co-optimization of bimanual hands and objects; integrate a super-resolution rendering module to mitigate blurring induced by multi-scale inputs; and systematically quantify the impact of each augmentation dimension on downstream understanding performance. Evaluated on the H2O and Arctic benchmarks, our synthesized data consistently improves state-of-the-art models’ accuracy, significantly expands pose coverage and viewpoint diversity, and advances bimanual hand–object interaction understanding toward practical deployment.

Technology Category

Application Category

📝 Abstract
Understanding of bimanual hand-object interaction plays an important role in robotics and virtual reality. However, due to significant occlusions between hands and object as well as the high degree-of-freedom motions, it is challenging to collect and annotate a high-quality, large-scale dataset, which prevents further improvement of bimanual hand-object interaction-related baselines. In this work, we propose a new 3D Gaussian Splatting based data augmentation framework for bimanual hand-object interaction, which is capable of augmenting existing dataset to large-scale photorealistic data with various hand-object pose and viewpoints. First, we use mesh-based 3DGS to model objects and hands, and to deal with the rendering blur problem due to multi-resolution input images used, we design a super-resolution module. Second, we extend the single hand grasping pose optimization module for the bimanual hand object to generate various poses of bimanual hand-object interaction, which can significantly expand the pose distribution of the dataset. Third, we conduct an analysis for the impact of different aspects of the proposed data augmentation on the understanding of the bimanual hand-object interaction. We perform our data augmentation on two benchmarks, H2O and Arctic, and verify that our method can improve the performance of the baselines.
Problem

Research questions and friction points this paper is trying to address.

Hand-Object Interaction
Dataset Limitations
Virtual Reality and Robotics
Innovation

Methods, ideas, or system contributions that make the work stand out.

HOGSA
3D Graphics Enhancement
Hand-Object Interaction Diversity
🔎 Similar Papers
No similar papers found.
Wentian Qu
Wentian Qu
Institute of Software Chinese Academy of Sciences
J
Jiahe Li
Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences
J
Jian Cheng
Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences
J
Jian Shi
Institute of Automation, Chinese Academy of Sciences
Chenyu Meng
Chenyu Meng
Institute of Software, Chinese Academy of Sciences
C
Cuixia Ma
Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences
H
Hongan Wang
Institute of Software, Chinese Academy of Sciences, University of Chinese Academy of Sciences
Xiaoming Deng
Xiaoming Deng
Institute of Software, CAS
Computer VisionRobotic ManipulationNatural User InterfacesVirtual HumansHand Tracking
Yinda Zhang
Yinda Zhang
Google Research
Computer VisionComputer GraphicsDeep LearningScene UnderstandingDigital Human