Dreaming Smoothly and Sample Efficiently with Gradient Penalized Latent Dynamics

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the limited sample efficiency of existing latent-variable world models—such as DreamerV3—in continuous control, which stems from their lack of explicit modeling of local smoothness. To remedy this, the authors propose incorporating local smoothness as an explicit regularizer in latent dynamics by penalizing the row-wise Jacobian gradients of the posterior latent distribution. This encourages the model to learn smoother transition dynamics. The required gradient penalty is efficiently estimated using Hutchinson’s stochastic trace estimator. Evaluated on DeepMind Control Suite proprioceptive tasks, the method significantly improves sample efficiency, achieves higher returns earlier—particularly on complex quadrupedal locomotion benchmarks—and exhibits more stable training dynamics compared to baseline approaches.

📝 Abstract

Model-based reinforcement learning improves sample efficiency by learning a world model. However, existing latent world models such as DreamerV3 do not explicitly enforce local smoothness in their learned transition dynamics, leaving a useful inductive bias for transition dynamics learning unexploited. We propose GPLD, a gradient-penalized latent dynamics regularizer for DreamerV3 that applies a row-wise Jacobian penalty to the posterior latent distribution to encourage locally smooth transition learning. We show that this penalty can be interpreted as the continuous-latent analog of finite-difference smoothing of transition laws in discrete embedded-state MDPs, and estimate it efficiently using Hutchinson-style stochastic probes. Empirically, across DeepMind Control proprioceptive tasks, GPLD improves aggregate sample efficiency, with particularly strong gains on higher-complexity locomotion environments. On more challenging quadruped tasks, GPLD reaches high-return behavior earlier and exhibits more consistent late-stage learning over longer horizons. Explicit local smoothness regularization is a simple and effective way to improve latent world models for smooth continuous control environments. Code for GPLD is available at github.com/romils9/gpld-mbrl .

Problem

Research questions and friction points this paper is trying to address.

latent dynamics

local smoothness

model-based reinforcement learning

sample efficiency

transition dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

gradient penalty

latent dynamics

local smoothness