Towards a General Framework for HTN Modeling with LLMs

📅 2025-11-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit poor quality and low syntactic validity when generating hierarchical planning (HP) models, particularly hierarchical task network (HTN) representations. Method: We propose L2HP, the first LLM-driven automatic HTN modeling framework, extending the L2P library to support HTN generation while unifying modeling in PDDL. We systematically evaluate LLMs’ capabilities on HP versus non-hierarchical automated planning using the PlanBench benchmark. Results: L2HP achieves a parsing success rate of 36%—comparable to non-hierarchical models—but only 1% syntactic validity, markedly lower than the 20% observed for flat planning models, highlighting HTN structural complexity as a novel challenge for LLMs. This work pioneers LLM-driven planning modeling in hierarchical domains and establishes a scalable, general-purpose paradigm for automated HTN construction.

Technology Category

Application Category

📝 Abstract
The use of Large Language Models (LLMs) for generating Automated Planning (AP) models has been widely explored; however, their application to Hierarchical Planning (HP) is still far from reaching the level of sophistication observed in non-hierarchical architectures. In this work, we try to address this gap. We present two main contributions. First, we propose L2HP, an extension of L2P (a library to LLM-driven PDDL models generation) that support HP model generation and follows a design philosophy of generality and extensibility. Second, we apply our framework to perform experiments where we compare the modeling capabilities of LLMs for AP and HP. On the PlanBench dataset, results show that parsing success is limited but comparable in both settings (around 36%), while syntactic validity is substantially lower in the hierarchical case (1% vs. 20% of instances). These findings underscore the unique challenges HP presents for LLMs, highlighting the need for further research to improve the quality of generated HP models.
Problem

Research questions and friction points this paper is trying to address.

Developing a general framework for Hierarchical Task Network modeling using Large Language Models
Addressing the sophistication gap between hierarchical and non-hierarchical planning with LLMs
Improving the quality and syntactic validity of generated hierarchical planning models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends L2P library for hierarchical planning generation
Proposes general framework supporting HTN modeling with LLMs
Compares LLM performance on automated versus hierarchical planning
🔎 Similar Papers
No similar papers found.
I
Israel Puerta-Merino
University of Granada, Spain
C
Carlos Núñez-Molina
RWTH Aachen University, Germany
Pablo Mesejo
Pablo Mesejo
Associate Professor, University of Granada & chief AI officer, Panacea Cooperative Research
Computer VisionMachine LearningArtificial IntelligenceBiomedical Image Analysis
J
Juan Fernández-Olivares
University of Granada, Spain