XGrammar 2: Dynamic and Efficient Structured Generation Engine for Agentic LLMs

📅 2026-01-07

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the growing demand for efficient and flexible structured generation in dynamic tasks—such as tool calling—for large language model (LLM) agents. To this end, we propose XGrammar 2, a high-performance structured generation engine that substantially reduces the overhead of dynamic structured output through several key innovations: a novel TagDispatch mechanism for dynamic semantic dispatch, just-in-time (JIT) compilation, cross-grammar caching, an Earley-parser-based mask generation algorithm, and compression techniques for repetitive structures. Experimental results demonstrate that XGrammar 2 achieves over a 6× speedup compared to existing engines while introducing negligible latency when integrated into LLM inference pipelines, offering an efficient and low-overhead solution for dynamic structured generation.

Technology Category

Application Category

📝 Abstract

Modern LLM agents are required to handle increasingly complex structured generation tasks, such as tool calling and conditional structured generation. These tasks are significantly more dynamic than predefined structures, posing new challenges to the current structured generation engines. In this paper, we propose XGrammar 2, a highly optimized structured generation engine for agentic LLMs. XGrammar 2 accelerates the mask generation for these dynamic structured generation tasks through a new dynamic dispatching semantics: TagDispatch. We further introduce a just-in-time (JIT) compilation method to reduce compilation time and a cross-grammar caching mechanism to leverage the common sub-structures across different grammars. Additionally, we extend the previous PDA-based mask generation algorithm to the Earley-parser-based one and design a repetition compression algorithm to handle repetition structures in grammars. Evaluation results show that XGrammar 2 can achieve more than 6x speedup over the existing structured generation engines. Integrated with an LLM inference engine, XGrammar 2 can handle dynamic structured generation tasks with near-zero overhead.

Problem

Research questions and friction points this paper is trying to address.

structured generation

agentic LLMs

dynamic structures

tool calling

conditional generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

structured generation

dynamic dispatching

JIT compilation