SHERPA: A Model-Driven Framework for Large Language Model Execution

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Large language models (LLMs) often underperform on complex, domain-specific tasks—such as code generation, class naming, and expert question answering—due to insufficient structured reasoning capabilities. To address this, we propose SHERPA, the first framework that explicitly models domain knowledge as a hierarchical state machine, with LLM-driven state transitions enabling tight integration of rule-guided constraints and data-driven inference. Crucially, SHERPA requires no additional model training; instead, it enhances behavioral controllability through runtime-structured execution control. Empirical evaluation across diverse complex tasks demonstrates that SHERPA significantly outperforms stateless baselines, particularly in scenarios heavily reliant on human expertise and formal conventions—where output quality improves markedly. These results validate both the effectiveness and broad applicability of structured execution control for enhancing LLM capabilities.

Technology Category

Application Category

📝 Abstract

Recently, large language models (LLMs) have achieved widespread application across various fields. Despite their impressive capabilities, LLMs suffer from a lack of structured reasoning ability, particularly for complex tasks requiring domain-specific best practices, which are often unavailable in the training data. Although multi-step prompting methods incorporating human best practices, such as chain-of-thought and tree-of-thought, have gained popularity, they lack a general mechanism to control LLM behavior. In this paper, we propose SHERPA, a model-driven framework to improve the LLM performance on complex tasks by explicitly incorporating domain-specific best practices into hierarchical state machines. By structuring the LLM execution processes using state machines, SHERPA enables more fine-grained control over their behavior via rules or decisions driven by machine learning-based approaches, including LLMs. We show that SHERPA is applicable to a wide variety of tasks-specifically, code generation, class name generation, and question answering-replicating previously proposed approaches while further improving the performance. We demonstrate the effectiveness of SHERPA for the aforementioned tasks using various LLMs. Our systematic evaluation compares different state machine configurations against baseline approaches without state machines. Results show that integrating well-designed state machines significantly improves the quality of LLM outputs, and is particularly beneficial for complex tasks with well-established human best practices but lacking data used for training LLMs.

Problem

Research questions and friction points this paper is trying to address.

Addresses LLMs' lack of structured reasoning for complex tasks

Incorporates domain-specific best practices via hierarchical state machines

Enables fine-grained control over LLM behavior using rules or ML

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-driven framework with hierarchical state machines

Incorporates domain-specific best practices explicitly

Enables fine-grained control via rules or ML decisions

🔎 Similar Papers

No similar papers found.