🤖 AI Summary
This work addresses the significant performance degradation of existing human-in-the-loop reinforcement learning systems in novel workstations where illumination changes induce distributional shifts in visual inputs, a setting in which standard fine-tuning often triggers catastrophic forgetting. To overcome this challenge without requiring additional real-world interactions, the authors propose an offline fine-tuning framework that innovatively integrates world model–based image relighting, HDR environment synthesis, an illumination-preserving replay (IRR) mechanism, and anchored Bellman-policy regularization. This approach simultaneously preserves source-domain performance and enhances adaptation to the target illumination conditions, all without retraining or collecting new data. Empirical evaluation across four real-world robotic manipulation tasks demonstrates that the method substantially outperforms baseline approaches, avoids performance collapse, and fully retains the original workstation’s success rate.
📝 Abstract
Human-in-the-loop reinforcement learning systems achieve near-perfect success on the workstation where they are trained, but collapse when the same robot is moved to a workstation a few meters away due to shifts in the visual input distribution caused by new lamp positions and window light. Re-collecting demonstrations and re-running HIL on every workstation is incompatible with deployment, and naively fine-tuning on shifted-light data triggers catastrophic forgetting of the source workstation. To close this cross-domain gap, we present RoHIL, an offline fine-tuning framework that uses no extra real-robot interaction. RoHIL combines (i) a world-model-based image relighter that re-synthesises the visual stream of source-workstation trajectories under multiple virtual HDRI environments, leaving actions and rewards real; (ii) Illumination-Retention Replay (IRR), a data-level anti-forgetting mechanism that interleaves relit adaptation transitions with original-light retention transitions to preserve source-workstation Bellman coverage; and (iii) an anchored Bellman-actor regulariser that constrains representation and policy drift from the original source-workstation policy. Across four real-robot manipulation tasks under significant cross-workstation illumination variations, RoHIL substantially improves shifted-light performance where standard HIL-RL collapses, while preserving source-workstation performance, eliminating the need to re-collect data and retrain for every new workstation and environment. Project page: https://anonymous4365.github.io/RoHIL/