🤖 AI Summary
In the Atari100k benchmark, pixel-based model-based reinforcement learning (MBRL) exhibits agent–human performance asymmetry—agents significantly outperform humans on some tasks yet fall far short on others—distorting aggregate metrics. Method: We first introduce a task categorization into Agent-Optimal and Human-Optimal subsets to expose this bias; second, we propose Jointly Embedded Diffusion for Imagination (JEDI), a latent diffusion world model that learns temporally structured latent spaces via end-to-end self-consistent optimization, integrating differentiable pixel-to-latent encoding/decoding with diffusion-based latent dynamics modeling. Results: JEDI achieves state-of-the-art performance on Human-Optimal tasks, maintains top-tier overall Atari100k scores, accelerates inference by 3×, and reduces memory usage by 43%, establishing a new paradigm for human-aligned evaluation in MBRL.
📝 Abstract
Recent advances in model-based reinforcement learning (MBRL) have achieved super-human level performance on the Atari100k benchmark, driven by reinforcement learning agents trained on powerful diffusion world models. However, we identify that the current aggregates mask a major performance asymmetry: MBRL agents dramatically outperform humans in some tasks despite drastically underperforming in others, with the former inflating the aggregate metrics. This is especially pronounced in pixel-based agents trained with diffusion world models. In this work, we address the pronounced asymmetry observed in pixel-based agents as an initial attempt to reverse the worrying upward trend observed in them. We address the problematic aggregates by delineating all tasks as Agent-Optimal or Human-Optimal and advocate for equal importance on metrics from both sets. Next, we hypothesize this pronounced asymmetry is due to the lack of temporally-structured latent space trained with the World Model objective in pixel-based methods. Lastly, to address this issue, we propose Joint Embedding DIffusion (JEDI), a novel latent diffusion world model trained end-to-end with the self-consistency objective. JEDI outperforms SOTA models in human-optimal tasks while staying competitive across the Atari100k benchmark, and runs 3 times faster with 43% lower memory than the latest pixel-based diffusion baseline. Overall, our work rethinks what it truly means to cross human-level performance in Atari100k.