Learning POMDP World Models from Observations with Language-Model Priors

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the challenge of efficiently learning world models for partially observable Markov decision processes (POMDPs) from observation-action trajectories, where low sample efficiency and high interaction costs are major obstacles. The authors propose a novel approach that leverages large language models (LLMs) as structural priors to generate candidate POMDP models from only a few trajectories. By iteratively refining these models using belief-state-based likelihood scores, the method achieves performance comparable to baselines with privileged access to true latent states—despite never observing them directly. It substantially outperforms conventional tabular POMDP learning techniques and exhibits improved results as LLM capabilities increase, thereby enabling highly sample-efficient world model learning.

📝 Abstract

Whether navigating a building, operating a robot, or playing a game, an agent that acts effectively in an environment must first learn an internal model of how that environment works. Partially-observable Markov decision processes (POMDPs) provide a flexible modeling class for such internal world models, but learning them from observation-action trajectories alone is challenging and typically requires extensive environment interaction. We ask whether language-model priors can reduce costly interaction by leveraging prior knowledge, and introduce \emph{Pinductor} (POMDP-inductor): an LLM proposes candidate POMDP models from a few observation-action trajectories and iteratively refines them to optimize a belief-based likelihood score. Despite using strictly less information, \emph{Pinductor} matches the performance and sample efficiency of LLM-based POMDP learning methods that assume privileged access to the hidden state, while significantly surpassing the sample efficiency of tabular POMDP baselines. Further results show that performance scales with LLM capability and degrades gracefully as semantic information about the environment is withheld. Together, these results position language-model priors as a practical tool for sample-efficient world-model learning under partial observability, and a step toward generalist agents in real-world environments. Code is available at https://github.com/atomresearch/pinductor.

Problem

Research questions and friction points this paper is trying to address.

POMDP

world model

partial observability

sample efficiency

language-model priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

POMDP

language-model priors

world model learning