Information-Theoretic Policy Pre-Training with Empowerment

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses the challenge of improving data efficiency and adaptability of reinforcement learning (RL) agents in downstream tasks through unsupervised pretraining. We propose a discounted empowerment framework—grounded in information-theoretic empowerment—that balances short-term responsiveness and long-term environmental controllability over temporal horizons, yielding more generalizable initial policies. To our knowledge, this is the first systematic evaluation of empowerment maximization as a universal pretraining paradigm across diverse RL algorithms, including SAC and PPO. Experiments on continuous-control benchmarks demonstrate that policies pretrained with long-horizon discounted empowerment achieve 2.1× faster downstream convergence on average, improve sample efficiency by reducing required interaction data by 37% for equivalent performance, and enhance cross-task transferability. Our results establish empowerment as a theoretically grounded, task-agnostic, and scalable initialization signal, offering a novel paradigm for unsupervised RL pretraining.

Technology Category

Application Category

📝 Abstract

Empowerment, an information-theoretic measure of an agent's potential influence on its environment, has emerged as a powerful intrinsic motivation and exploration framework for reinforcement learning (RL). Besides for unsupervised RL and skill learning algorithms, the specific use of empowerment as a pre-training signal has received limited attention in the literature. We show that empowerment can be used as a pre-training signal for data-efficient downstream task adaptation. For this we extend the traditional notion of empowerment by introducing discounted empowerment, which balances the agent's control over the environment across short- and long-term horizons. Leveraging this formulation, we propose a novel pre-training paradigm that initializes policies to maximize discounted empowerment, enabling agents to acquire a robust understanding of environmental dynamics. We analyze empowerment-based pre-training for various existing RL algorithms and empirically demonstrate its potential as a general-purpose initialization strategy: empowerment-maximizing policies with long horizons are data-efficient and effective, leading to improved adaptability in downstream tasks. Our findings pave the way for future research to scale this framework to high-dimensional and complex tasks, further advancing the field of RL.

Problem

Research questions and friction points this paper is trying to address.

Using empowerment as pre-training signal for efficient task adaptation

Extending traditional empowerment with discounted horizon balancing

Developing general-purpose policy initialization for improved downstream adaptability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Empowerment used as pre-training signal

Discounted empowerment balances control horizons

Policies initialized to maximize discounted empowerment

🔎 Similar Papers

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL