🤖 AI Summary
This work addresses the problem of learning propositional STRIPS world models solely from action trajectories. The method reformulates STRIPS modeling as a next-token prediction task over pure action sequences, leveraging Transformer-based architectures for end-to-end supervised learning. It requires only positive and negative action sequences—no state observations or symbolic priors—and implicitly induces action preconditions and effects. This represents the first fully data-driven approach to learning STRIPS world models, eliminating reliance on explicit logical inference or hand-crafted predicates. Experiments demonstrate that the model accurately recovers ground-truth STRIPS rules, faithfully captures behavioral constraints and state-transition structure, and generalizes effectively across multiple domains.
📝 Abstract
We consider the problem of learning propositional STRIPS world models from action traces alone, using a deep learning architecture (transformers) and gradient descent. The task is cast as a supervised next token prediction problem where the tokens are the actions, and an action $a$ may follow an action sequence if the hidden effects of the previous actions do not make an action precondition of $a$ false. We show that a suitable transformer architecture can faithfully represent propositional STRIPS world models, and that the models can be learned from sets of random valid (positive) and invalid (negative) action sequences alone. A number of experiments are reported.