WGRAMMAR: Leverage Prior Knowledge to Accelerate Structured Decoding

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Structured decoding—e.g., for HTML or JSON generation—faces efficiency bottlenecks due to syntactic compilation, state tracking, and mask construction. To address this, we propose a constraint decomposition framework grounded in prior structural knowledge: syntax constraints are decoupled into static, pre-compilable components and dynamic runtime parameters; compositional regular operators replace traditional pushdown automata to reduce state-transition overhead. We further introduce grammar-fragment-driven constraint decomposition, domain-aware simplification, and mask caching to realize a lightweight decoding engine. Experiments demonstrate that our method achieves up to 250× decoding speedup while preserving generation correctness—significantly outperforming existing structured decoding systems. The implementation is open-sourced.

Technology Category

Application Category

📝 Abstract
Structured decoding enables large language models (LLMs) to generate outputs in formats required by downstream systems, such as HTML or JSON. However, existing methods suffer from efficiency bottlenecks due to grammar compilation, state tracking, and mask creation. We observe that many real-world tasks embed strong prior knowledge about output structure. Leveraging this, we propose a decomposition of constraints into static and dynamic components -- precompiling static structures offline and instantiating dynamic arguments at runtime using grammar snippets. Instead of relying on pushdown automata, we employ a compositional set of operators to model regular formats, achieving lower transition latency. We introduce wgrammar, a lightweight decoding engine that integrates domain-aware simplification, constraint decomposition, and mask caching, achieving up to 250x speedup over existing systems. wgrammar's source code is publicly available at https://github.com/wrran/wgrammar.
Problem

Research questions and friction points this paper is trying to address.

Accelerate structured decoding in LLMs
Reduce efficiency bottlenecks in grammar processing
Leverage prior knowledge for output structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decompose constraints into static and dynamic components
Use compositional operators for regular formats
Integrate domain-aware simplification and mask caching
🔎 Similar Papers
No similar papers found.