🤖 AI Summary
Ensuring both syntactic and semantic correctness of large language model (LLM) outputs remains a critical challenge. This paper proposes a zero-shot, decoding-time unified control framework that enhances output correctness of arbitrary black-box LLMs without fine-tuning. Methodologically, it introduces (1) Answer Set Grammars (ASGs) to formally encode context-sensitive, task- and instance-level semantic constraints—capable of integrating background knowledge—and (2) a constraint-guided, token-level Monte Carlo Tree Search (MCTS) for logic-driven, fine-grained decoding intervention. Experimental results demonstrate substantial performance gains on syntax synthesis, compositional reasoning, and planning tasks: small pre-trained LLMs achieve 100% output correctness while outperforming significantly larger models (e.g., o1-preview) in task-specific accuracy. The framework thus enables rigorous, interpretable, and parameter-efficient correctness enforcement at inference time.
📝 Abstract
Ensuring both syntactic and semantic correctness in Large Language Model (LLM) outputs remains a significant challenge, despite being critical for real-world deployment. In this paper, we introduce $ exttt{SEM-CTRL}$, a unified approach that enforces rich context-sensitive constraints and task- and instance-specific semantics directly on an LLM decoder. Our approach integrates token-level MCTS, which is guided by specific syntactic and semantic constraints. The constraints over the desired outputs are expressed using Answer Set Grammars -- a logic-based formalism that generalizes context-sensitive grammars while incorporating background knowledge to represent task-specific semantics. We show that our approach guarantees correct completions for any off-the-shelf LLM without the need for fine-tuning. We evaluate $ exttt{SEM-CTRL}$ on a range of tasks, including synthetic grammar synthesis, combinatorial reasoning, and planning. Our results demonstrate that $ exttt{SEM-CTRL}$ allows small pre-trained LLMs to efficiently outperform larger variants and state-of-the-art reasoning models (e.g., o1-preview) while simultaneously guaranteeing solution correctness.