Offline Reinforcement Learning with Universal Horizon Models

📅 2026-05-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

219K/year
🤖 AI Summary
This work addresses the challenges in offline reinforcement learning where iterative model-based rollouts often suffer from error accumulation, and existing geometric horizon models fail to accurately represent distant future states. To overcome these limitations, the paper introduces the Universal Horizon Model (UHM), which, for the first time, enables direct prediction of future states at arbitrary horizons, thereby generalizing and surpassing conventional geometric horizon modeling. By incorporating a winsorized horizon distribution for truncated value learning, UHM effectively stabilizes training dynamics. Evaluated across 100 tasks in OGBench, UHM significantly outperforms current baselines, demonstrating particularly strong performance in scenarios with highly suboptimal data and those requiring long-horizon reasoning.
📝 Abstract
Model-based reinforcement learning (RL) offers a compelling approach to offline RL by enabling value learning on imagined on-policy trajectories. However, it often suffers from compounding errors due to repeated model inference on self-generated states. While geometric horizon models (GHM) alleviate this issue through direct prediction over a discounted infinite-horizon future, they remain challenged in accurately modeling distant future states. To this end, we introduce universal horizon models (UHM), a generalization of GHM that directly predicts future states under arbitrary horizons. Leveraging this flexibility, we propose a scalable value learning method that employs a winsorized horizon distribution to stabilize training by capping excessively large horizons. Experimental results on 100 challenging OGBench tasks demonstrate that the proposed method outperforms competitive baselines, particularly on tasks with highly suboptimal datasets and those requiring long-horizon reasoning. Project page: https://rllab-snu.github.io/projects/UHM/
Problem

Research questions and friction points this paper is trying to address.

Offline Reinforcement Learning
Model-based RL
Compounding Errors
Long-horizon Prediction
Geometric Horizon Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Universal Horizon Models
Offline Reinforcement Learning
Model-based RL
Winsorized Horizon Distribution
Long-horizon Reasoning
🔎 Similar Papers