Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

This work addresses the credit assignment challenge in offline goal-conditioned reinforcement learning, particularly under sparse rewards, where temporal delays between actions and their long-term consequences hinder learning. The authors propose a novel approach based on generative world models that leverages occupancy measures and optimal transport theory to embed temporal information into the geometric structure of the environment. This enables the construction of a goal-reaching shaping reward that preserves the optimality of policies. Evaluated across 13 long-horizon locomotion and manipulation tasks, the method achieves an average performance improvement of 2.2× over baselines. Furthermore, it demonstrates practical efficacy by successfully transferring to three real-world tokamak nuclear fusion control tasks, validating both its effectiveness and real-world applicability.

Technology Category

Application Category

📝 Abstract

The temporal lag between actions and their long-term consequences makes credit assignment a challenge when learning goal-directed behaviors from data. Generative world models capture the distribution of future states an agent may visit, indicating that they have captured temporal information. How can that temporal information be extracted to perform credit assignment? In this paper, we formalize how the temporal information stored in world models encodes the underlying geometry of the world. Leveraging optimal transport, we extract this geometry from a learned model of the occupancy measure into a reward function that captures goal-reaching information. Our resulting method, Occupancy Reward Shaping, largely mitigates the problem of credit assignment in sparse reward settings. ORS provably does not alter the optimal policy, yet empirically improves performance by 2.2x across 13 diverse long-horizon locomotion and manipulation tasks. Moreover, we demonstrate the effectiveness of ORS in the real world for controlling nuclear fusion on 3 Tokamak control tasks. Code: https://github.com/aravindvenu7/occupancy_reward_shaping; Website: https://aravindvenu7.github.io/website/ors/

Problem

Research questions and friction points this paper is trying to address.

credit assignment

offline reinforcement learning

goal-conditioned reinforcement learning

sparse reward

temporal lag

Innovation

Methods, ideas, or system contributions that make the work stand out.

Occupancy Reward Shaping

Credit Assignment

Optimal Transport