RoHIL: Robust Human-in-the-Loop Robotic Reinforcement Learning Against Illumination Variations

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

222K/year
🤖 AI Summary
This work addresses the significant performance degradation of existing human-in-the-loop reinforcement learning systems in novel workstations where illumination changes induce distributional shifts in visual inputs, a setting in which standard fine-tuning often triggers catastrophic forgetting. To overcome this challenge without requiring additional real-world interactions, the authors propose an offline fine-tuning framework that innovatively integrates world model–based image relighting, HDR environment synthesis, an illumination-preserving replay (IRR) mechanism, and anchored Bellman-policy regularization. This approach simultaneously preserves source-domain performance and enhances adaptation to the target illumination conditions, all without retraining or collecting new data. Empirical evaluation across four real-world robotic manipulation tasks demonstrates that the method substantially outperforms baseline approaches, avoids performance collapse, and fully retains the original workstation’s success rate.
📝 Abstract
Human-in-the-loop reinforcement learning systems achieve near-perfect success on the workstation where they are trained, but collapse when the same robot is moved to a workstation a few meters away due to shifts in the visual input distribution caused by new lamp positions and window light. Re-collecting demonstrations and re-running HIL on every workstation is incompatible with deployment, and naively fine-tuning on shifted-light data triggers catastrophic forgetting of the source workstation. To close this cross-domain gap, we present RoHIL, an offline fine-tuning framework that uses no extra real-robot interaction. RoHIL combines (i) a world-model-based image relighter that re-synthesises the visual stream of source-workstation trajectories under multiple virtual HDRI environments, leaving actions and rewards real; (ii) Illumination-Retention Replay (IRR), a data-level anti-forgetting mechanism that interleaves relit adaptation transitions with original-light retention transitions to preserve source-workstation Bellman coverage; and (iii) an anchored Bellman-actor regulariser that constrains representation and policy drift from the original source-workstation policy. Across four real-robot manipulation tasks under significant cross-workstation illumination variations, RoHIL substantially improves shifted-light performance where standard HIL-RL collapses, while preserving source-workstation performance, eliminating the need to re-collect data and retrain for every new workstation and environment. Project page: https://anonymous4365.github.io/RoHIL/
Problem

Research questions and friction points this paper is trying to address.

illumination variation
human-in-the-loop reinforcement learning
visual distribution shift
catastrophic forgetting
cross-domain adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-in-the-loop RL
illumination invariance
offline fine-tuning
catastrophic forgetting
image relighting
S
Shuoqin Zhang
Chongqing University
Y
Yixin Xiong
Chongqing University
X
Xiru Gao
Chongqing University
Kai Liu
Kai Liu
College of Computer Science, Chongqing University
Edge IntelligenceInternet of VehiclesAutonomous DrivingPervasive Computing
K
Ke Wang
Chongqing University
X
Xichuan Zhou
Chongqing University
Z
Zhe Hu
Chongqing University