Robotic Skill Diversification via Active Mutation of Reward Functions in Reinforcement Learning During a Liquid Pouring Task

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reinforcement learning often yields insufficient behavioral diversity in robotic skill acquisition. Method: We propose an active reward function mutation framework to autonomously induce diverse manipulation skills in a liquid pouring task. Specifically, we design a Gaussian-noise-driven reward weight mutation mechanism integrated with a human-inspired cost-benefit trade-off model; training is conducted in Isaac Sim using the PPO algorithm on a Franka Panda robot, with a multi-objective reward comprising accuracy, task completion time, and energy consumption. Contribution/Results: The approach spontaneously emergent functional skills—including rim-cleaning, liquid mixing, and targeted irrigation—without explicit supervision or task redefinition. These results demonstrate significant improvements in skill generalization and emergence capability, validating the framework’s effectiveness in enabling embodied agents to autonomously expand their behavioral repertoire. This work establishes a novel paradigm for self-driven skill diversification in robotics.

Technology Category

Application Category

📝 Abstract
This paper explores how deliberate mutations of reward function in reinforcement learning can produce diversified skill variations in robotic manipulation tasks, examined with a liquid pouring use case. To this end, we developed a new reward function mutation framework that is based on applying Gaussian noise to the weights of the different terms in the reward function. Inspired by the cost-benefit tradeoff model from human motor control, we designed the reward function with the following key terms: accuracy, time, and effort. The study was performed in a simulation environment created in NVIDIA Isaac Sim, and the setup included Franka Emika Panda robotic arm holding a glass with a liquid that needed to be poured into a container. The reinforcement learning algorithm was based on Proximal Policy Optimization. We systematically explored how different configurations of mutated weights in the rewards function would affect the learned policy. The resulting policies exhibit a wide range of behaviours: from variations in execution of the originally intended pouring task to novel skills useful for unexpected tasks, such as container rim cleaning, liquid mixing, and watering. This approach offers promising directions for robotic systems to perform diversified learning of specific tasks, while also potentially deriving meaningful skills for future tasks.
Problem

Research questions and friction points this paper is trying to address.

Diversifying robotic skills through reward function mutations in reinforcement learning
Developing a framework for active mutation of reward function weights using Gaussian noise
Exploring how mutated reward configurations affect policy learning in liquid pouring tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Active mutation of reward function weights
Gaussian noise applied to reward terms
Diversified skills from accuracy-time-effort tradeoff
🔎 Similar Papers
No similar papers found.
J
Jannick van Buuren
Cognitive Robotics, Delft University of Technology, Delft, The Netherlands
R
Roberto Giglio
Department of Mechanical Engineering, Politecnico di Milano, Italy
L
Loris Roveda
Department of Mechanical Engineering, Politecnico di Milano, Italy; Istituto Dalle Molle di studi sull’intelligenza artificiale (IDSIA USI-SUPSI), Scuola universitaria professionale della Svizzera italiana, DTI, Lugano, Switzerland
Luka Peternel
Luka Peternel
Delft University of Technology
TeleoperationPhysical Human-Robot InteractionRobot LearningShared ControlHuman Motor Control