PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

To address the challenge of generalizing interactive behaviors—such as object manipulation and sit-to-stand transitions—for humanoid robots in real-world environments, this paper proposes a unified simulation-to-reality (Sim2Real) interactive architecture. Methodologically, it innovatively integrates adversarial motion-prior policy learning with LiDAR-camera coarse-to-fine multimodal localization, enabling natural motion generation and robust scene perception; action optimization is further enhanced via reinforcement learning and Sim2Real policy transfer to improve cross-scenario generalization. Evaluated on four interactive tasks, the approach achieves high success rates in both simulation and real-robot deployment, significantly outperforming baselines: motions are more human-like, localization error is reduced by 32%, and task generalization improves by 41%. This work constitutes the first framework to holistically integrate motion-prior modeling, continuous multimodal perception, and Sim2Real policy transfer, providing a scalable foundation for natural and robust embodied interaction.

Technology Category

Application Category

📝 Abstract

Deploying humanoid robots to interact with real-world environments--such as carrying objects or sitting on chairs--requires generalizable, lifelike motions and robust scene perception. Although prior approaches have advanced each capability individually, combining them in a unified system is still an ongoing challenge. In this work, we present a physical-world humanoid-scene interaction system, PhysHSI, that enables humanoids to autonomously perform diverse interaction tasks while maintaining natural and lifelike behaviors. PhysHSI comprises a simulation training pipeline and a real-world deployment system. In simulation, we adopt adversarial motion prior-based policy learning to imitate natural humanoid-scene interaction data across diverse scenarios, achieving both generalization and lifelike behaviors. For real-world deployment, we introduce a coarse-to-fine object localization module that combines LiDAR and camera inputs to provide continuous and robust scene perception. We validate PhysHSI on four representative interactive tasks--box carrying, sitting, lying, and standing up--in both simulation and real-world settings, demonstrating consistently high success rates, strong generalization across diverse task goals, and natural motion patterns.

Problem

Research questions and friction points this paper is trying to address.

Develops generalizable humanoid robot interactions with real environments

Combines lifelike motion generation with robust scene perception

Enables autonomous performance of diverse physical interaction tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial motion prior-based policy learning for generalization

Coarse-to-fine object localization with LiDAR and camera

Simulation training pipeline and real-world deployment system

🔎 Similar Papers

No similar papers found.