🤖 AI Summary
Continual reinforcement learning faces challenges in autonomously detecting environmental shifts, retaining knowledge, and retrieving relevant memories—without external task labels. This paper introduces the first end-to-end differentiable policy optimization framework integrating a familiarity-aware autoencoder: the autoencoder learns compact environment representations, and reconstruction error serves as an unsupervised signal for task boundary detection and selective memory retrieval. Crucially, the method operates without explicit task boundary annotations, enabling multi-task sequential learning and robust re-identification of previously encountered environments. Evaluated on standard continual RL benchmarks, it significantly mitigates catastrophic forgetting. Key contributions are: (1) the first familiarity-aware, end-to-end differentiable continual policy optimization; (2) a unified architecture jointly modeling environment identification, change detection, and memory retrieval; and (3) empirical validation of effective continual adaptation under fully unsupervised task delineation—i.e., with no task identity signals whatsoever.
📝 Abstract
Continual learning for reinforcement learning agents remains a significant challenge, particularly in preserving and leveraging existing information without an external signal to indicate changes in tasks or environments. In this study, we explore the effectiveness of autoencoders in detecting new tasks and matching observed environments to previously encountered ones. Our approach integrates policy optimization with familiarity autoencoders within an end-to-end continual learning system. This system can recognize and learn new tasks or environments while preserving knowledge from earlier experiences and can selectively retrieve relevant knowledge when re-encountering a known environment. Initial results demonstrate successful continual learning without external signals to indicate task changes or reencounters, showing promise for this methodology.