RoHIL: Robust Human-in-the-Loop Robotic Reinforcement Learning Against Illumination Variations

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses the significant performance degradation of existing human-in-the-loop reinforcement learning systems in novel workstations where illumination changes induce distributional shifts in visual inputs, a setting in which standard fine-tuning often triggers catastrophic forgetting. To overcome this challenge without requiring additional real-world interactions, the authors propose an offline fine-tuning framework that innovatively integrates world model–based image relighting, HDR environment synthesis, an illumination-preserving replay (IRR) mechanism, and anchored Bellman-policy regularization. This approach simultaneously preserves source-domain performance and enhances adaptation to the target illumination conditions, all without retraining or collecting new data. Empirical evaluation across four real-world robotic manipulation tasks demonstrates that the method substantially outperforms baseline approaches, avoids performance collapse, and fully retains the original workstation’s success rate.

📝 Abstract

Human-in-the-loop reinforcement learning systems achieve near-perfect success on the workstation where they are trained, but collapse when the same robot is moved to a workstation a few meters away due to shifts in the visual input distribution caused by new lamp positions and window light. Re-collecting demonstrations and re-running HIL on every workstation is incompatible with deployment, and naively fine-tuning on shifted-light data triggers catastrophic forgetting of the source workstation. To close this cross-domain gap, we present RoHIL, an offline fine-tuning framework that uses no extra real-robot interaction. RoHIL combines (i) a world-model-based image relighter that re-synthesises the visual stream of source-workstation trajectories under multiple virtual HDRI environments, leaving actions and rewards real; (ii) Illumination-Retention Replay (IRR), a data-level anti-forgetting mechanism that interleaves relit adaptation transitions with original-light retention transitions to preserve source-workstation Bellman coverage; and (iii) an anchored Bellman-actor regulariser that constrains representation and policy drift from the original source-workstation policy. Across four real-robot manipulation tasks under significant cross-workstation illumination variations, RoHIL substantially improves shifted-light performance where standard HIL-RL collapses, while preserving source-workstation performance, eliminating the need to re-collect data and retrain for every new workstation and environment. Project page: https://anonymous4365.github.io/RoHIL/

Problem

Research questions and friction points this paper is trying to address.

illumination variation

human-in-the-loop reinforcement learning

visual distribution shift

catastrophic forgetting

cross-domain adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-in-the-loop RL

illumination invariance

offline fine-tuning

catastrophic forgetting

image relighting

🔎 Similar Papers

Adaptive Task Allocation in Multi-Human Multi-Robot Teams under Team Heterogeneity and Dynamic Information Uncertainty

2024-09-20arXiv.orgCitations: 1

Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data

2023-06-06International Conference on Learning RepresentationsCitations: 4