From Word to World: Can Large Language Models be Implicit Text-based World Models?

📅 2025-12-21

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) can serve as implicit, text-based world models to improve reinforcement learning (RL) efficiency for agents in textual environments. Methodologically, it reframes language modeling as interactive next-state prediction, integrating implicit state representation, action validation, synthetic trajectory generation, RL warm-starting, and scaling-law analysis. The work introduces a three-tier evaluation framework—assessing fidelity and consistency, scalability and robustness, and agent utility—to systematically characterize the effective boundaries of LLMs as world models for the first time, identifying behavioral coverage and environmental complexity as key limiting factors. Empirical validation across five canonical text-based environments demonstrates that well-trained LLMs maintain coherent latent states, exhibit predictable scaling of world-modeling capability with data and model size, and significantly enhance agent sample efficiency and asymptotic performance.

Technology Category

Application Category

📝 Abstract

Agentic reinforcement learning increasingly relies on experience-driven scaling, yet real-world environments remain non-adaptive, limited in coverage, and difficult to scale. World models offer a potential way to improve learning efficiency through simulated experience, but it remains unclear whether large language models can reliably serve this role and under what conditions they meaningfully benefit agents. We study these questions in text-based environments, which provide a controlled setting to reinterpret language modeling as next-state prediction under interaction. We introduce a three-level framework for evaluating LLM-based world models: (i) fidelity and consistency, (ii) scalability and robustness, and (iii) agent utility. Across five representative environments, we find that sufficiently trained world models maintain coherent latent state, scale predictably with data and model size, and improve agent performance via action verification, synthetic trajectory generation, and warm-starting reinforcement learning. Meanwhile, these gains depend critically on behavioral coverage and environment complexity, delineating clear boundry on when world modeling effectively supports agent learning.

Problem

Research questions and friction points this paper is trying to address.

Evaluating if large language models can serve as reliable text-based world models.

Assessing conditions under which these models meaningfully benefit agent learning.

Delineating boundaries of world modeling effectiveness based on coverage and complexity.

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs as next-state predictors in text environments

Three-level framework for evaluating world model fidelity

World models improve agents via action verification and synthetic data

🔎 Similar Papers

Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models