Finetuning Deep Reinforcement Learning Policies with Evolutionary Strategies for Control of Underactuated Robots

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep reinforcement learning (DRL) policies for underactuated robotic control often suffer from misalignment with task objectives and poor robustness. Method: This paper proposes a zeroth-order fine-tuning approach based on Separable Natural Evolution Strategies (SNES), operating directly on a pre-trained Soft Actor-Critic (SAC) policy. Instead of relying on gradient estimation—which introduces bias—the method optimizes the original task reward metric end-to-end. A surrogate reward function is introduced to approximate true performance, accelerating evolutionary search without policy reparameterization or architectural modification. Contribution/Results: The method significantly improves control accuracy and robustness in complex, dynamic environments. Evaluated on the IROS 2024 RealAIGym competition benchmark, it substantially outperforms baseline methods, achieving state-of-the-art scores—demonstrating both effectiveness and strong generalization capability.

Technology Category

Application Category

📝 Abstract
Deep Reinforcement Learning (RL) has emerged as a powerful method for addressing complex control problems, particularly those involving underactuated robotic systems. However, in some cases, policies may require refinement to achieve optimal performance and robustness aligned with specific task objectives. In this paper, we propose an approach for fine-tuning Deep RL policies using Evolutionary Strategies (ES) to enhance control performance for underactuated robots. Our method involves initially training an RL agent with Soft-Actor Critic (SAC) using a surrogate reward function designed to approximate complex specific scoring metrics. We subsequently refine this learned policy through a zero-order optimization step employing the Separable Natural Evolution Strategy (SNES), directly targeting the original score. Experimental evaluations conducted in the context of the 2nd AI Olympics with RealAIGym at IROS 2024 demonstrate that our evolutionary fine-tuning significantly improves agent performance while maintaining high robustness. The resulting controllers outperform established baselines, achieving competitive scores for the competition tasks.
Problem

Research questions and friction points this paper is trying to address.

Fine-tune Deep RL policies for underactuated robots
Enhance control performance using Evolutionary Strategies
Improve robustness and task-specific scoring metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tune Deep RL policies with Evolutionary Strategies
Use SAC with surrogate reward for initial training
Optimize policy with SNES for better performance
🔎 Similar Papers
No similar papers found.
M
Marco Calì
Department of Information Engineering, University of Padova, Via Gradenigo 6/B, Padova, 35131, Italy
Alberto Sinigaglia
Alberto Sinigaglia
PhD student
Deep Reinforcement LearningDeep Learning
Niccolò Turcato
Niccolò Turcato
PhD Student University of Padova
Reinforcement LearningRobotics
Ruggero Carli
Ruggero Carli
Associate Professor at University of Padova
Control Theory
G
Gian Antonio Susto
Department of Information Engineering, University of Padova, Via Gradenigo 6/B, Padova, 35131, Italy; Human-Inspired Technology Research Center, University of Padova, Via Luzzatti, 4, Padova, 35121, Italy