Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

244K/year

🤖 AI Summary

To address performance degradation in dexterous manipulation tasks during sim-to-real transfer caused by dynamics mismatch, this paper proposes an uncertainty-aware reinforcement learning framework. Methodologically, it integrates physical priors from vision-language models (VLMs) with online interaction data to construct an interpretable, physics-parameterized model. Leveraging 3D Gaussian splatting for geometric reconstruction, VLM-driven inference of physical parameter distributions, and ensemble-based uncertainty quantification, the framework enables dynamic estimation and adaptive correction of physical parameters. Compared to domain randomization baselines, our approach achieves 100% success rates on T-block assembly and hammer-pushing tasks, reduces average task completion time by 15%, and significantly improves policy robustness and generalization in real-world settings.

Technology Category

Application Category

📝 Abstract

Learning robotic manipulation policies directly in the real world can be expensive and time-consuming. While reinforcement learning (RL) policies trained in simulation present a scalable alternative, effective sim-to-real transfer remains challenging, particularly for tasks that require precise dynamics. To address this, we propose Phys2Real, a real-to-sim-to-real RL pipeline that combines vision-language model (VLM)-inferred physical parameter estimates with interactive adaptation through uncertainty-aware fusion. Our approach consists of three core components: (1) high-fidelity geometric reconstruction with 3D Gaussian splatting, (2) VLM-inferred prior distributions over physical parameters, and (3) online physical parameter estimation from interaction data. Phys2Real conditions policies on interpretable physical parameters, refining VLM predictions with online estimates via ensemble-based uncertainty quantification. On planar pushing tasks of a T-block with varying center of mass (CoM) and a hammer with an off-center mass distribution, Phys2Real achieves substantial improvements over a domain randomization baseline: 100% vs 79% success rate for the bottom-weighted T-block, 57% vs 23% in the challenging top-weighted T-block, and 15% faster average task completion for hammer pushing. Ablation studies indicate that the combination of VLM and interaction information is essential for success. Project website: https://phys2real.github.io/ .

Problem

Research questions and friction points this paper is trying to address.

Enabling robust sim-to-real transfer for robotic manipulation tasks

Estimating physical parameters through vision-language models and online adaptation

Addressing dynamics uncertainty in planar pushing with varying mass distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fuses VLM priors with online adaptation

Uses uncertainty-aware fusion for parameter estimation

Employs 3D Gaussian splatting for reconstruction

🔎 Similar Papers

No similar papers found.

Toyota Research Institute

Los Altos, CA / Cambridge, MA

Research Scientist Intern, Robotic Control Policy (PhD)