Beyond State Consistency: Behavior Consistency in Text-Based World Models

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work addresses the limitation of conventional text-based world models, which rely on single-step state-matching metrics such as Exact Match and struggle to capture behavioral consistency of agents. The authors propose a behavior alignment training paradigm that, for the first time, explicitly optimizes world models for behavioral consistency. This is achieved by freezing a reference agent to compute a behavior consistency reward (BehR), which is then combined with the standard state prediction loss during training. Evaluated on WebShop and TextWorld environments, the approach substantially enhances long-horizon behavioral alignment—particularly excelling in WebShop—while preserving or even improving single-step prediction accuracy. Additionally, it reduces false positives in offline evaluation and demonstrates promising potential for lookahead planning during inference.

Technology Category

Application Category

📝 Abstract

World models have been emerging as critical components for assessing the consequences of actions generated by interactive agents in online planning and offline evaluation. In text-based environments, world models are typically evaluated and trained with single-step metrics such as Exact Match, aiming to improve the similarity between predicted and real-world states, but such metrics have been shown to be insufficient for capturing actual agent behavior. To address this issue, we introduce a new behavior-aligned training paradigm aimed at improving the functional consistency between the world model and the real environment. This paradigm focuses on optimizing a tractable step-level metric named Behavior Consistency Reward (BehR), which measures how much the likelihood of a logged next action changes between the real state and the world-model-predicted state under a frozen Reference Agent. Experiments on WebShop and TextWorld show that BehR-based training improves long-term alignment in several settings, with the clearest gains in WebShop and less movement in near-ceiling regimes, while preserving or improving single-step prediction quality in three of four settings. World models trained with BehR also achieve lower false positives in offline surrogate evaluation and show modest but encouraging gains in inference-time lookahead planning.

Problem

Research questions and friction points this paper is trying to address.

world models

behavior consistency

text-based environments

agent behavior

state prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Behavior Consistency

World Models

Behavior Consistency Reward (BehR)