Balancing Plasticity and Stability with Fast and Slow Successor Features

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In continually non-stationary environments, deep reinforcement learning agents struggle to balance stability and plasticity, particularly under gradual environmental shifts. This work introduces naturalistic, continuous non-stationarity by modifying MiniWorld and MuJoCo environments and proposes integrating a neuroscience-inspired synaptic consolidation mechanism into successor features—operating across multiple timescales—instead of conventional Q-values to enhance representational stability. Empirical results demonstrate that this approach substantially outperforms baseline methods relying on parameter resetting or other plasticity-emphasizing strategies, underscoring the critical role of stable predictive representations in continual learning. The findings further suggest that, in gradually changing environments, prioritizing stability over plasticity yields superior performance.
📝 Abstract
A hallmark of intelligence is the ability to adapt in non-stationary environments, yet deep Reinforcement Learning (RL) agents often struggle in such settings. Prior studies introduce non-stationarity through abrupt shifts in features or dynamics, whereas real-world environments often evolve gradually through continual drift. This distinction has important implications for the "stability-plasticity dilemma" in RL, as abrupt task changes may demand more plasticity than naturalistic settings. To address this, we modify existing 3D Miniworld and MuJoCo environments to incorporate naturalistic, continual non-stationarity, and use them to examine how stability and adaptation affect performance under continuous environmental change. We find that methods favoring stability, such as synaptic consolidation, outperform approaches focused on plasticity, such as parameters resetting. Motivated by this result, and prior evidence that Successor Features (SFs) reduce interference, we investigate whether SFs are better consolidation targets than Q-values. Across both environments, applying neuro-inspired synaptic consolidation to SFs yields superior performance on continually changing settings. Moreover, consolidation is most effective when SFs are stabilized across multiple timescales, which capture complementary aspects of gradual environmental change. Together, these results suggest that stability is more critical in continual learning when changes are gradual, and that multi-timescale consolidation of predictive representations is an effective approach.
Problem

Research questions and friction points this paper is trying to address.

stability-plasticity dilemma
continual non-stationarity
reinforcement learning
successor features
continual learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Successor Features
synaptic consolidation
continual non-stationarity
stability-plasticity dilemma
multi-timescale learning