Lost in Space: Optimizing Tokens for Grammar-Constrained Decoding

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work addresses the performance degradation of large language models (LLMs) in zero-shot structured output generation (e.g., classification, annotation) caused by syntactic constraints in decoding. We systematically investigate how token-level formatting variations—such as numeric type representation and leading whitespace—affect outputs that are semantically equivalent but syntactically distinct. Conducting controlled experiments across four LLM families and four NLP benchmarks, we evaluate five syntactically constrained output formats. Our study is the first to demonstrate that prepending a whitespace token consistently improves accuracy by 5–10%, with smaller models exhibiting greater sensitivity to such formatting choices. We further uncover non-trivial couplings between optimal format selection and both model scale and task type. Based on these findings, we propose lightweight, format-level optimization guidelines for zero-shot structured generation—offering an immediately deployable, plug-and-play performance enhancement for resource-constrained or local deployment scenarios.

Technology Category

Application Category

📝 Abstract

General-purpose language models are trained to produce varied natural language outputs, but for some tasks like annotation or classification we need more specific output formats. LLM systems increasingly support structured output, sampling tokens according to a grammar, which enforces a format but which can also reduce performance. We ask whether there are systematic differences between grammars that appear semantically similar to humans. To answer this question, we test four popular model families with five token formats on four NLP benchmarks. All models perform most accurately when instructed to classify with real numbers. Performance also improves by 5%-10% when models are instructed to return tokens incorporating leading whitespace, which we find can help models avoid structural deficiencies in subword token representations. Format-based differences are largest for smaller models that are often used for local laptop-scale inference. We present best practices for researchers using language models as zero-shot classifiers with structured output.

Problem

Research questions and friction points this paper is trying to address.

Optimizing tokens for grammar-constrained decoding

Improving structured output in language models

Enhancing performance in NLP benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizing tokens with grammar constraints

Incorporating leading whitespace enhances accuracy

Best practices for structured output classification

🔎 Similar Papers

Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling

2024-08-16arXiv.orgCitations: 4

Nvidia

30 USD - 94 USD

US, CA, Santa Clara

Authors to Follow