Efficient Imitation Without Demonstrations via Value-Penalized Auxiliary Control from Examples

📅 2024-07-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low exploration and sample efficiency of reinforcement learning in the absence of expert trajectory demonstrations, this paper proposes an example-guided value-penalization framework. It requires only a small set of successful state examples—without full expert trajectories—and integrates auxiliary control tasks with a value-function constraint based on a success threshold to enable efficient exploration in sparse-reward environments. The key contribution is the first synergistic modeling of lightweight state examples and value penalization, eliminating strong assumptions about the expert policy distribution. Evaluated on both simulation and real-robot platforms, the method achieves significantly accelerated convergence—up to several-fold speedup—while ensuring bounded and stable value estimation. It consistently outperforms full-trajectory imitation learning and classical sparse-reward baselines in task performance.

Technology Category

Application Category

📝 Abstract
Common approaches to providing feedback in reinforcement learning are the use of hand-crafted rewards or full-trajectory expert demonstrations. Alternatively, one can use examples of completed tasks, but such an approach can be extremely sample inefficient. We introduce value-penalized auxiliary control from examples (VPACE), an algorithm that significantly improves exploration in example-based control by adding examples of simple auxiliary tasks and an above-success-level value penalty. Across both simulated and real robotic environments, we show that our approach substantially improves learning efficiency for challenging tasks, while maintaining bounded value estimates. Preliminary results also suggest that VPACE may learn more efficiently than the more common approaches of using full trajectories or true sparse rewards. Project site: https://papers.starslab.ca/vpace/ .
Problem

Research questions and friction points this paper is trying to address.

Improves exploration in example-based control tasks
Enhances learning efficiency without expert demonstrations
Maintains bounded value estimates in robotic environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses value-penalized auxiliary control
Improves exploration with auxiliary tasks
Enhances learning efficiency in robotics
🔎 Similar Papers
No similar papers found.