JEDI: Latent End-to-end Diffusion Mitigates Agent-Human Performance Asymmetry in Model-Based Reinforcement Learning

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

In the Atari100k benchmark, pixel-based model-based reinforcement learning (MBRL) exhibits agent–human performance asymmetry—agents significantly outperform humans on some tasks yet fall far short on others—distorting aggregate metrics. Method: We first introduce a task categorization into Agent-Optimal and Human-Optimal subsets to expose this bias; second, we propose Jointly Embedded Diffusion for Imagination (JEDI), a latent diffusion world model that learns temporally structured latent spaces via end-to-end self-consistent optimization, integrating differentiable pixel-to-latent encoding/decoding with diffusion-based latent dynamics modeling. Results: JEDI achieves state-of-the-art performance on Human-Optimal tasks, maintains top-tier overall Atari100k scores, accelerates inference by 3×, and reduces memory usage by 43%, establishing a new paradigm for human-aligned evaluation in MBRL.

Technology Category

Application Category

📝 Abstract

Recent advances in model-based reinforcement learning (MBRL) have achieved super-human level performance on the Atari100k benchmark, driven by reinforcement learning agents trained on powerful diffusion world models. However, we identify that the current aggregates mask a major performance asymmetry: MBRL agents dramatically outperform humans in some tasks despite drastically underperforming in others, with the former inflating the aggregate metrics. This is especially pronounced in pixel-based agents trained with diffusion world models. In this work, we address the pronounced asymmetry observed in pixel-based agents as an initial attempt to reverse the worrying upward trend observed in them. We address the problematic aggregates by delineating all tasks as Agent-Optimal or Human-Optimal and advocate for equal importance on metrics from both sets. Next, we hypothesize this pronounced asymmetry is due to the lack of temporally-structured latent space trained with the World Model objective in pixel-based methods. Lastly, to address this issue, we propose Joint Embedding DIffusion (JEDI), a novel latent diffusion world model trained end-to-end with the self-consistency objective. JEDI outperforms SOTA models in human-optimal tasks while staying competitive across the Atari100k benchmark, and runs 3 times faster with 43% lower memory than the latest pixel-based diffusion baseline. Overall, our work rethinks what it truly means to cross human-level performance in Atari100k.

Problem

Research questions and friction points this paper is trying to address.

Addresses performance asymmetry in MBRL agents vs humans

Proposes new metrics for Agent-Optimal and Human-Optimal tasks

Introduces JEDI to improve latent space temporal structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Joint Embedding DIffusion (JEDI) model

End-to-end latent diffusion training

Self-consistency objective optimization

🔎 Similar Papers

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL