Adaptive Policy Learning to Additional Tasks

📅 2023-05-24
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of incremental adaptation of pre-trained policies to new tasks while preserving performance on original tasks. To this end, we propose Adaptive Policy Gradient (APG), the first method to integrate the Bellman optimality principle into the policy gradient framework, establishing a hybrid optimization paradigm that synergistically combines policy gradient updates with dynamic programming principles. We provide theoretical guarantees showing that APG achieves a convergence rate of O(1/T) and sample complexity of O(1/ε). Empirical evaluation on benchmark control tasks—including CartPole, LunarLander, and Robot Arm—demonstrates that APG attains performance comparable to deterministic policy gradient methods, yet with significantly fewer environment samples and faster convergence. These results highlight substantial improvements in both incremental learning efficiency and generalization stability.
📝 Abstract
This paper develops a policy learning method for tuning a pre-trained policy to adapt to additional tasks without altering the original task. A method named Adaptive Policy Gradient (APG) is proposed in this paper, which combines Bellman's principle of optimality with the policy gradient approach to improve the convergence rate. This paper provides theoretical analysis which guarantees the convergence rate and sample complexity of $mathcal{O}(1/T)$ and $mathcal{O}(1/epsilon)$, respectively, where $T$ denotes the number of iterations and $epsilon$ denotes the accuracy of the resulting stationary policy. Furthermore, several challenging numerical simulations, including cartpole, lunar lander, and robot arm, are provided to show that APG obtains similar performance compared to existing deterministic policy gradient methods while utilizing much less data and converging at a faster rate.
Problem

Research questions and friction points this paper is trying to address.

Adapting pre-trained policies to new tasks without affecting original performance
Improving convergence rate and sample efficiency in policy learning
Validating method effectiveness through challenging robotic simulation environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Policy Gradient method for task adaptation
Combines Bellman optimality with policy gradients
Achieves faster convergence with less data usage
🔎 Similar Papers
No similar papers found.