Physically-based Lighting Augmentation for Robotic Manipulation

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor generalization of robotic imitation learning under varying illumination conditions, this paper proposes the first physics-driven inverse rendering-based illumination augmentation framework tailored for behavioral cloning. Methodologically, it jointly estimates geometry (depth, surface normals) and material properties (albedo, roughness, metallicness) from the first frame of human demonstration videos, then enables controllable illumination transfer via physically based relighting. Temporal illumination propagation is modeled using fine-tuned Stable Video Diffusion to preserve structural and motion consistency across frames. Crucially, this work pioneers the integration of inverse rendering into illumination-robust imitation learning. Evaluated on a 7-DOF robotic arm across 720 real-world trials under cross-illumination conditions, the approach reduces generalization error by 40.1%, significantly improving policy transferability and supporting diverse downstream tasks.

Technology Category

Application Category

📝 Abstract
Despite advances in data augmentation, policies trained via imitation learning still struggle to generalize across environmental variations such as lighting changes. To address this, we propose the first framework that leverages physically-based inverse rendering for lighting augmentation on real-world human demonstrations. Specifically, inverse rendering decomposes the first frame in each demonstration into geometric (surface normal, depth) and material (albedo, roughness, metallic) properties, which are then used to render appearance changes under different lighting. To ensure consistent augmentation across each demonstration, we fine-tune Stable Video Diffusion on robot execution videos for temporal lighting propagation. We evaluate our framework by measuring the structural and temporal consistency of the augmented sequences, and by assessing its effectiveness in reducing the behavior cloning generalization gap (40.1%) on a 7-DoF robot across 6 lighting conditions using 720 real-world evaluations. We further showcase three downstream applications enabled by the proposed framework.
Problem

Research questions and friction points this paper is trying to address.

Generalization of imitation learning across lighting changes
Physically-based lighting augmentation for real-world demonstrations
Reducing behavior cloning gap in varied lighting conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Physically-based inverse rendering for lighting augmentation
Fine-tuned Stable Video Diffusion for temporal consistency
Reduces behavior cloning gap by 40.1%
🔎 Similar Papers
No similar papers found.
S
Shutong Jin
KTH Royal Institute of Technology
Lezhong Wang
Lezhong Wang
Technical University of Denmark
B
Ben Temming
KTH Royal Institute of Technology
Florian T. Pokorny
Florian T. Pokorny
Associate Professor, KTH Royal Institute of Technology
Machine LearningRobotics