Zero-Shot Goal Recognition with Large Language Models

📅 2026-05-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

201K/year
🤖 AI Summary
This work investigates how large language models (LLMs) can perform zero-shot goal recognition from agent behavior sequences without relying on training data. To this end, it introduces the goal recognition task into classical PDDL planning benchmarks for the first time, establishing a novel paradigm to evaluate LLMs’ foundational planning knowledge. Through carefully designed zero-shot prompting strategies, the study systematically analyzes the capacity of several state-of-the-art LLMs to integrate world knowledge with observed evidence. Experimental results reveal that certain models improve their accuracy as more observations become available, approaching the performance of landmark-based methods, while others overly rely on prior knowledge and struggle to effectively incorporate new evidence—highlighting fundamental differences in how these models handle evidence integration.
📝 Abstract
Large language models have recently reached near-parity with classical planners on well-known planning domains, yet this competence relies on world-knowledge exploitation rather than genuine symbolic reasoning. Goal recognition is a complementary abductive task structurally better suited to LLM strengths: it consists of evaluating consistency with world knowledge rather than generating novel action sequences. This paper provides the first systematic zero-shot evaluation of frontier LLMs as goal recognisers on key classical PDDL benchmarks. Our results show that LLM competence on goal recognition is uneven: some models scale with evidence and approach landmark-based accuracy at full observations, while others remain anchored to world-knowledge priors regardless of how much evidence accumulates. Qualitative analysis of model reasoning traces reveals that this divergence reflects a fundamental difference in evidence integration rather than domain familiarity. These findings position goal recognition as a principled benchmark for the foundational planning knowledge of LLMs.
Problem

Research questions and friction points this paper is trying to address.

zero-shot
goal recognition
large language models
PDDL
evidence integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot goal recognition
large language models
evidence integration
PDDL benchmarks
abductive reasoning
🔎 Similar Papers
No similar papers found.