Language Model Planners do not Scale, but do Formalizers?

📅 2026-03-24

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the challenge of combinatorial explosion faced by large language models (LLMs) in complex planning tasks, where the scalability of LLMs as formalizers remains unclear. The authors propose a novel paradigm termed “higher-order formalization,” in which the LLM generates program generators rather than direct solutions, thereby decoupling output length from the exponential growth of the formalization space. To enhance robustness, they further introduce a divide-and-conquer formalization strategy. The approach is evaluated on a newly constructed benchmark, Unraveling, featuring tasks such as BlocksWorld with state spaces as large as 10¹⁶⁵. Experimental results demonstrate that this paradigm enables LLM-based formalizers to maintain perfect accuracy while significantly improving the scalability and stability of smaller models in complex planning scenarios.

Technology Category

Application Category

📝 Abstract

Recent work shows overwhelming evidence that LLMs, even those trained to scale their reasoning trace, perform unsatisfactorily when solving planning problems too complex. Whether the same conclusion holds for LLM formalizers that generate solver-oriented programs remains unknown. We systematically show that LLM formalizers greatly out-scale LLM planners, some retaining perfect accuracy in the classic BlocksWorld domain with a huge state space of size up to $10^{165}$. While performance of smaller LLM formalizers degrades with problem complexity, we show that a divide-and-conquer formalizing technique can greatly improve its robustness. Finally, we introduce unraveling problems where one line of problem description realistically corresponds to exponentially many lines of formal language such as the Planning Domain Definition Language (PDDL), greatly challenging LLM formalizers. We tackle this challenge by introducing a new paradigm, namely LLM-as-higher-order-formalizer, where an LLM generates a program generator. This decouples token output from the combinatorial explosion of the underlying formalization and search space.

Problem

Research questions and friction points this paper is trying to address.

Language Model Planners

Formalizers

Planning Problems

Scalability

Unraveling Problems

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM formalizer

scalability

divide-and-conquer