PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of generalizing interactive behaviors—such as object manipulation and sit-to-stand transitions—for humanoid robots in real-world environments, this paper proposes a unified simulation-to-reality (Sim2Real) interactive architecture. Methodologically, it innovatively integrates adversarial motion-prior policy learning with LiDAR-camera coarse-to-fine multimodal localization, enabling natural motion generation and robust scene perception; action optimization is further enhanced via reinforcement learning and Sim2Real policy transfer to improve cross-scenario generalization. Evaluated on four interactive tasks, the approach achieves high success rates in both simulation and real-robot deployment, significantly outperforming baselines: motions are more human-like, localization error is reduced by 32%, and task generalization improves by 41%. This work constitutes the first framework to holistically integrate motion-prior modeling, continuous multimodal perception, and Sim2Real policy transfer, providing a scalable foundation for natural and robust embodied interaction.

Technology Category

Application Category

📝 Abstract
Deploying humanoid robots to interact with real-world environments--such as carrying objects or sitting on chairs--requires generalizable, lifelike motions and robust scene perception. Although prior approaches have advanced each capability individually, combining them in a unified system is still an ongoing challenge. In this work, we present a physical-world humanoid-scene interaction system, PhysHSI, that enables humanoids to autonomously perform diverse interaction tasks while maintaining natural and lifelike behaviors. PhysHSI comprises a simulation training pipeline and a real-world deployment system. In simulation, we adopt adversarial motion prior-based policy learning to imitate natural humanoid-scene interaction data across diverse scenarios, achieving both generalization and lifelike behaviors. For real-world deployment, we introduce a coarse-to-fine object localization module that combines LiDAR and camera inputs to provide continuous and robust scene perception. We validate PhysHSI on four representative interactive tasks--box carrying, sitting, lying, and standing up--in both simulation and real-world settings, demonstrating consistently high success rates, strong generalization across diverse task goals, and natural motion patterns.
Problem

Research questions and friction points this paper is trying to address.

Develops generalizable humanoid robot interactions with real environments
Combines lifelike motion generation with robust scene perception
Enables autonomous performance of diverse physical interaction tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial motion prior-based policy learning for generalization
Coarse-to-fine object localization with LiDAR and camera
Simulation training pipeline and real-world deployment system
🔎 Similar Papers
No similar papers found.
Huayi Wang
Huayi Wang
Shanghai Jiao Tong University
RoboticsReinforcement Learning
Wentao Zhang
Wentao Zhang
Institute of Physics, Chinese Academy of Sciences
photoemissionsuperconductivitycupratehtsctime-resolved
R
Runyi Yu
Shanghai AI Laboratory, HKUST
T
Tao Huang
Shanghai AI Laboratory
J
Junli Ren
Shanghai AI Laboratory
F
Feiyu Jia
Shanghai AI Laboratory
Z
Zirui Wang
Shanghai AI Laboratory
X
Xiaojie Niu
Shanghai AI Laboratory
X
Xiao Chen
Shanghai AI Laboratory
J
Jiahe Chen
Shanghai AI Laboratory
Qifeng Chen
Qifeng Chen
HKUST
Computational PhotographyImage SynthesisGenerative AIAutonomous DrivingEmbodied AI
J
Jingbo Wang
Shanghai AI Laboratory
J
Jiangmiao Pang
Shanghai AI Laboratory