Finetuning Deep Reinforcement Learning Policies with Evolutionary Strategies for Control of Underactuated Robots

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Deep reinforcement learning (DRL) policies for underactuated robotic control often suffer from misalignment with task objectives and poor robustness. Method: This paper proposes a zeroth-order fine-tuning approach based on Separable Natural Evolution Strategies (SNES), operating directly on a pre-trained Soft Actor-Critic (SAC) policy. Instead of relying on gradient estimation—which introduces bias—the method optimizes the original task reward metric end-to-end. A surrogate reward function is introduced to approximate true performance, accelerating evolutionary search without policy reparameterization or architectural modification. Contribution/Results: The method significantly improves control accuracy and robustness in complex, dynamic environments. Evaluated on the IROS 2024 RealAIGym competition benchmark, it substantially outperforms baseline methods, achieving state-of-the-art scores—demonstrating both effectiveness and strong generalization capability.

Technology Category

Application Category

📝 Abstract

Deep Reinforcement Learning (RL) has emerged as a powerful method for addressing complex control problems, particularly those involving underactuated robotic systems. However, in some cases, policies may require refinement to achieve optimal performance and robustness aligned with specific task objectives. In this paper, we propose an approach for fine-tuning Deep RL policies using Evolutionary Strategies (ES) to enhance control performance for underactuated robots. Our method involves initially training an RL agent with Soft-Actor Critic (SAC) using a surrogate reward function designed to approximate complex specific scoring metrics. We subsequently refine this learned policy through a zero-order optimization step employing the Separable Natural Evolution Strategy (SNES), directly targeting the original score. Experimental evaluations conducted in the context of the 2nd AI Olympics with RealAIGym at IROS 2024 demonstrate that our evolutionary fine-tuning significantly improves agent performance while maintaining high robustness. The resulting controllers outperform established baselines, achieving competitive scores for the competition tasks.

Problem

Research questions and friction points this paper is trying to address.

Fine-tune Deep RL policies for underactuated robots

Enhance control performance using Evolutionary Strategies

Improve robustness and task-specific scoring metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tune Deep RL policies with Evolutionary Strategies

Use SAC with surrogate reward for initial training

Optimize policy with SNES for better performance

🔎 Similar Papers

No similar papers found.