Blueprint First, Model Second: A Framework for Deterministic LLM Workflow

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Non-determinism in large language model (LLM) agents severely limits their applicability to structured tasks requiring high procedural fidelity and predictable execution. To address this, we propose a “blueprint-first, model-second” framework that decouples LLM generative capability from workflow logic: executable blueprints are explicitly defined in source code and verified for correctness, then executed by a deterministic runtime engine; LLMs serve solely as constrained, on-demand tools for complex subtasks, enforcing strict separation between planning and execution. This architecture achieves the first deterministic encapsulation of LLM invocations—guaranteeing reproducible, stable execution regardless of LLM output variability. Evaluated on tau-bench, our approach achieves a Pass@1 score averaging 10.1 percentage points higher than the strongest baseline. It significantly enhances process controllability, result reproducibility, and execution robustness in LLM-driven automation.

Technology Category

Application Category

📝 Abstract

While powerful, the inherent non-determinism of large language model (LLM) agents limits their application in structured operational environments where procedural fidelity and predictable execution are strict requirements. This limitation stems from current architectures that conflate probabilistic, high-level planning with low-level action execution within a single generative process. To address this, we introduce the Source Code Agent framework, a new paradigm built on the "Blueprint First, Model Second" philosophy. Our framework decouples the workflow logic from the generative model. An expert-defined operational procedure is first codified into a source code-based Execution Blueprint, which is then executed by a deterministic engine. The LLM is strategically invoked as a specialized tool to handle bounded, complex sub-tasks within the workflow, but never to decide the workflow's path. We conduct a comprehensive evaluation on the challenging tau-bench benchmark, designed for complex user-tool-rule scenarios. Our results demonstrate that the Source Code Agent establishes a new state-of-the-art, outperforming the strongest baseline by 10.1 percentage points on the average Pass^1 score while dramatically improving execution efficiency. Our work enables the verifiable and reliable deployment of autonomous agents in applications governed by strict procedural logic.

Problem

Research questions and friction points this paper is trying to address.

Addresses non-determinism in LLM agents for structured environments

Decouples workflow logic from generative model execution

Enables verifiable autonomous agents in strict procedural applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples workflow logic from generative model

Uses expert-defined Execution Blueprint for procedures

LLM handles bounded sub-tasks, not workflow path

🔎 Similar Papers

AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks