🤖 AI Summary
This paper addresses the lack of a unified evaluation standard for large language models’ (LLMs) planning capabilities. We propose the first six-dimensional evaluation framework—comprising completeness, executability, optimality, representational capacity, generalizability, and efficiency—integrating classical AI planning theory with contemporary LLM empirical research. Through systematic bibliometric and comparative analysis across diverse tasks (e.g., web navigation, travel planning, database querying) and model architectures, we construct the first structured capability map of mainstream LLM-based planners, precisely delineating methodological boundaries. Our contributions are threefold: (1) the first multidimensional, unified evaluation framework for LLM planning; (2) an extensible, benchmarked analytical protocol; and (3) identification of three critical future research directions. The work provides both theoretical foundations and practical guidelines for evaluating and enhancing planning capabilities in agentic AI systems. (149 words)
📝 Abstract
LLMs have immense potential for generating plans, transforming an initial world state into a desired goal state. A large body of research has explored the use of LLMs for various planning tasks, from web navigation to travel planning and database querying. However, many of these systems are tailored to specific problems, making it challenging to compare them or determine the best approach for new tasks. There is also a lack of clear and consistent evaluation criteria. Our survey aims to offer a comprehensive overview of current LLM planners to fill this gap. It builds on foundational work by Kartam and Wilkins (1990) and examines six key performance criteria: completeness, executability, optimality, representation, generalization, and efficiency. For each, we provide a thorough analysis of representative works and highlight their strengths and weaknesses. Our paper also identifies crucial future directions, making it a valuable resource for both practitioners and newcomers interested in leveraging LLM planning to support agentic workflows.