🤖 AI Summary
Robots exhibit poor generalization when manipulating transparent or reflective materials (e.g., glass, metal): simulation-to-real transfer methods suffer from visual domain gaps, while real-data-driven approaches are costly and struggle to cover material diversity. To address this, we propose M³A—the first framework to incorporate a physics-based imaging model into robotic manipulation. Given a single real-world demonstration, M³A enables material-decoupled visuomotor policy learning via differentiable photometric rerendering and physically consistent illumination modeling. It synthesizes high-fidelity, multi-material training data and establishes the first cross-domain (sim-to-real) benchmark for multi-material manipulation. Evaluated on three real-world tasks, M³A achieves an average success rate improvement of 58.03% over baselines and demonstrates strong zero-shot generalization to unseen materials.
📝 Abstract
Material generalization is essential for real-world robotic manipulation, where robots must interact with objects exhibiting diverse visual and physical properties. This challenge is particularly pronounced for objects made of glass, metal, or other materials whose transparent or reflective surfaces introduce severe out-of-distribution variations. Existing approaches either rely on simulated materials in simulators and perform sim-to-real transfer, which is hindered by substantial visual domain gaps, or depend on collecting extensive real-world demonstrations, which is costly, time-consuming, and still insufficient to cover various materials. To overcome these limitations, we resort to computational photography and introduce Mutable Material Manipulation Augmentation (M$^3$A), a unified framework that leverages the physical characteristics of materials as captured by light transport for photometric re-rendering. The core idea is simple yet powerful: given a single real-world demonstration, we photometrically re-render the scene to generate a diverse set of highly realistic demonstrations with different material properties. This augmentation effectively decouples task-specific manipulation skills from surface appearance, enabling policies to generalize across materials without additional data collection. To systematically evaluate this capability, we construct the first comprehensive multi-material manipulation benchmark spanning both simulation and real-world environments. Extensive experiments show that the M$^3$A policy significantly enhances cross-material generalization, improving the average success rate across three real-world tasks by 58.03%, and demonstrating robust performance on previously unseen materials.