Generating Symbolic World Models via Test-time Scaling of Large Language Models

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address key challenges in generating Planning Domain Definition Language (PDDL) with large language models (LLMs)—including imprecise state modeling, constraint violations, and lack of optimality guarantees—this paper proposes a fine-tuning-free test-time optimization framework. Methodologically, it introduces: (1) the first test-time scaling mechanism; (2) a synergistic optimization algorithm integrating Best-of-N sampling with verbalized machine learning; and (3) a verifiable, symbolic PDDL world model seamlessly coupled with A* search to ensure optimal plan generation. Evaluated on two PDDL generation tasks, the approach achieves over 50% success rate, substantially outperforming o1-mini. Moreover, it establishes new state-of-the-art performance across multiple competition-grade planning benchmarks.

Technology Category

Application Category

📝 Abstract
Solving complex planning problems requires Large Language Models (LLMs) to explicitly model the state transition to avoid rule violations, comply with constraints, and ensure optimality-a task hindered by the inherent ambiguity of natural language. To overcome such ambiguity, Planning Domain Definition Language (PDDL) is leveraged as a planning abstraction that enables precise and formal state descriptions. With PDDL, we can generate a symbolic world model where classic searching algorithms, such as A*, can be seamlessly applied to find optimal plans. However, directly generating PDDL domains with current LLMs remains an open challenge due to the lack of PDDL training data. To address this challenge, we propose to scale up the test-time computation of LLMs to enhance their PDDL reasoning capabilities, thereby enabling the generation of high-quality PDDL domains. Specifically, we introduce a simple yet effective algorithm, which first employs a Best-of-N sampling approach to improve the quality of the initial solution and then refines the solution in a fine-grained manner with verbalized machine learning. Our method outperforms o1-mini by a considerable margin in the generation of PDDL domain, achieving over 50% success rate on two tasks (i.e., generating PDDL domains from natural language description or PDDL problems). This is done without requiring additional training. By taking advantage of PDDL as state abstraction, our method is able to outperform current state-of-the-art methods on almost all competition-level planning tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhance PDDL reasoning with LLMs
Generate symbolic world models
Improve planning task success rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhance PDDL reasoning via test-time scaling
Best-of-N sampling improves solution quality
Verbalized machine learning refines solutions
🔎 Similar Papers
No similar papers found.
Zhouliang Yu
Zhouliang Yu
The SphereLab, CUHK
Reinforcement LearningLLMFormal AI
Y
Yuhuan Yuan
The Hong Kong University of Science and Technology (Guangzhou)
Tim Z. Xiao
Tim Z. Xiao
University of Tübingen · International Max Planck Research School for Intelligent Systems (IMPRS-IS)
Machine LearningProbabilistic ModelsLarge Language Models
F
Fuxiang Frank Xia
Environmental Systems Research Institute, Inc.
J
Jie Fu
Shanghai Artificial Intelligence Laboratory
G
Ge Zhang
SEED, Bytedance
G
Ge Lin
The Hong Kong University of Science and Technology (Guangzhou)
Weiyang Liu
Weiyang Liu
CUHK | Max Planck Institute for Intelligent Systems
Machine LearningArtificial IntelligenceComputer Vision