From Next Token Prediction to (STRIPS) World Models -- Preliminary Results

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the problem of learning propositional STRIPS world models solely from action trajectories. The method reformulates STRIPS modeling as a next-token prediction task over pure action sequences, leveraging Transformer-based architectures for end-to-end supervised learning. It requires only positive and negative action sequences—no state observations or symbolic priors—and implicitly induces action preconditions and effects. This represents the first fully data-driven approach to learning STRIPS world models, eliminating reliance on explicit logical inference or hand-crafted predicates. Experiments demonstrate that the model accurately recovers ground-truth STRIPS rules, faithfully captures behavioral constraints and state-transition structure, and generalizes effectively across multiple domains.

Technology Category

Application Category

📝 Abstract

We consider the problem of learning propositional STRIPS world models from action traces alone, using a deep learning architecture (transformers) and gradient descent. The task is cast as a supervised next token prediction problem where the tokens are the actions, and an action $a$ may follow an action sequence if the hidden effects of the previous actions do not make an action precondition of $a$ false. We show that a suitable transformer architecture can faithfully represent propositional STRIPS world models, and that the models can be learned from sets of random valid (positive) and invalid (negative) action sequences alone. A number of experiments are reported.

Problem

Research questions and friction points this paper is trying to address.

Learning STRIPS world models from action traces

Using transformers for next token prediction

Validating models with positive and negative sequences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformers predict next action tokens

Learning STRIPS models from action traces

Supervised training with valid and invalid sequences

🔎 Similar Papers

No similar papers found.