E2HiL: Entropy-Guided Sample Selection for Efficient Real-World Human-in-the-Loop Reinforcement Learning

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low sample efficiency and frequent human intervention inherent in existing human-in-the-loop reinforcement learning approaches, which incur substantial labor costs. To mitigate these issues, the authors propose an entropy-guided active sample selection mechanism that leverages the influence function of policy entropy to efficiently estimate the impact of individual samples on exploration. By selectively requesting human feedback only on samples with moderate information content, the method effectively filters out both shortcut samples that cause abrupt entropy drops and uninformative noise, thereby improving the exploration–exploitation trade-off. Evaluated on four real-world manipulation tasks, the approach achieves a 42.1% improvement in success rate over the state-of-the-art baseline while reducing human intervention by 10.1%.

Technology Category

Application Category

📝 Abstract
Human-in-the-loop guidance has emerged as an effective approach for enabling faster convergence in online reinforcement learning (RL) of complex real-world manipulation tasks. However, existing human-in-the-loop RL (HiL-RL) frameworks often suffer from low sample efficiency, requiring substantial human interventions to achieve convergence and thereby leading to high labor costs. To address this, we propose a sample-efficient real-world human-in-the-loop RL framework named \method, which requires fewer human intervention by actively selecting informative samples. Specifically, stable reduction of policy entropy enables improved trade-off between exploration and exploitation with higher sample efficiency. We first build influence functions of different samples on the policy entropy, which is efficiently estimated by the covariance of action probabilities and soft advantages of policies. Then we select samples with moderate values of influence functions, where shortcut samples that induce sharp entropy drops and noisy samples with negligible effect are pruned. Extensive experiments on four real-world manipulation tasks demonstrate that \method achieves a 42.1\% higher success rate while requiring 10.1\% fewer human interventions compared to the state-of-the-art HiL-RL method, validating its effectiveness. The project page providing code, videos, and mathematical formulations can be found at https://e2hil.github.io/.
Problem

Research questions and friction points this paper is trying to address.

human-in-the-loop reinforcement learning
sample efficiency
real-world manipulation
human intervention
policy entropy
Innovation

Methods, ideas, or system contributions that make the work stand out.

entropy-guided
sample selection
human-in-the-loop reinforcement learning
influence function
sample efficiency
🔎 Similar Papers
No similar papers found.
Haoyuan Deng
Haoyuan Deng
Nanyang Technological University
RoboticsImitation LearningReinforcement Learning
Y
Yuanjiang Xue
Nanyang Technological University, Singapore
H
Haoyang Du
Nanyang Technological University, Singapore
B
Boyang Zhou
Nanyang Technological University, Singapore
Z
Zhenyu Wu
Beijing University of Posts and Telecommunications, Beijing, China
Ziwei Wang
Ziwei Wang
School of Electrical and Electronic Engineering, Nanyang Technological University
embodied AIroboticscomputer vision