Maintaining Plasticity in Reinforcement Learning: A Cost-Aware Framework for Aerial Robot Control in Non-stationary Environments

📅 2025-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the degradation and collapse of policy plasticity during prolonged training of reinforcement learning (RL) agents in non-stationary environments—particularly in aerial robot control—this paper proposes Retrospective Cost Mechanism (RECOM), the first dynamic learning rate adaptation framework that explicitly models the cost-gradient relationship between reward and loss. RECOM integrates Proximal Policy Optimization (PPO), cost-gradient modeling, adaptive learning rate updates, and wind-disturbance environment modeling. In time-varying wind conditions, it achieves stable hovering without policy collapse throughout training. Neuron dormancy rate decreases by 11.29% compared to L2-regularized PPO, demonstrating superior plasticity preservation. The core contribution is the formalization of policy plasticity maintenance as a cost-gradient-driven dynamic optimization problem—establishing an interpretable and scalable new paradigm for RL control in non-stationary settings.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) has demonstrated the ability to maintain the plasticity of the policy throughout short-term training in aerial robot control. However, these policies have been shown to loss of plasticity when extended to long-term learning in non-stationary environments. For example, the standard proximal policy optimization (PPO) policy is observed to collapse in long-term training settings and lead to significant control performance degradation. To address this problem, this work proposes a cost-aware framework that uses a retrospective cost mechanism (RECOM) to balance rewards and losses in RL training with a non-stationary environment. Using a cost gradient relation between rewards and losses, our framework dynamically updates the learning rate to actively train the control policy in a disturbed wind environment. Our experimental results show that our framework learned a policy for the hovering task without policy collapse in variable wind conditions and has a successful result of 11.29% less dormant units than L2 regularization with PPO.
Problem

Research questions and friction points this paper is trying to address.

Addresses loss of plasticity in long-term RL for aerial robots.
Proposes cost-aware framework to balance rewards and losses.
Improves policy stability in non-stationary wind environments.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cost-aware framework with retrospective cost mechanism
Dynamic learning rate updates for non-stationary environments
Reduced dormant units compared to L2 regularization
🔎 Similar Papers
No similar papers found.
A
A. T. Karaşahin
Faculty of Engineering, Department of Mechatronics Engineering, Necmettin Erbakan University, Turkey
Z
Ziniu Wu
School of Civil, Aerospace and Design Engineering, University of Bristol, UK
Basaran Bahadir Kocer
Basaran Bahadir Kocer
University of Bristol
roboticsmechatronicsaerospacecontrol