$ exttt{SEM-CTRL}$: Semantically Controlled Decoding

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

137K/year

🤖 AI Summary

Ensuring both syntactic and semantic correctness of large language model (LLM) outputs remains a critical challenge. This paper proposes a zero-shot, decoding-time unified control framework that enhances output correctness of arbitrary black-box LLMs without fine-tuning. Methodologically, it introduces (1) Answer Set Grammars (ASGs) to formally encode context-sensitive, task- and instance-level semantic constraints—capable of integrating background knowledge—and (2) a constraint-guided, token-level Monte Carlo Tree Search (MCTS) for logic-driven, fine-grained decoding intervention. Experimental results demonstrate substantial performance gains on syntax synthesis, compositional reasoning, and planning tasks: small pre-trained LLMs achieve 100% output correctness while outperforming significantly larger models (e.g., o1-preview) in task-specific accuracy. The framework thus enables rigorous, interpretable, and parameter-efficient correctness enforcement at inference time.

Technology Category

Application Category

📝 Abstract

Ensuring both syntactic and semantic correctness in Large Language Model (LLM) outputs remains a significant challenge, despite being critical for real-world deployment. In this paper, we introduce $ exttt{SEM-CTRL}$, a unified approach that enforces rich context-sensitive constraints and task- and instance-specific semantics directly on an LLM decoder. Our approach integrates token-level MCTS, which is guided by specific syntactic and semantic constraints. The constraints over the desired outputs are expressed using Answer Set Grammars -- a logic-based formalism that generalizes context-sensitive grammars while incorporating background knowledge to represent task-specific semantics. We show that our approach guarantees correct completions for any off-the-shelf LLM without the need for fine-tuning. We evaluate $ exttt{SEM-CTRL}$ on a range of tasks, including synthetic grammar synthesis, combinatorial reasoning, and planning. Our results demonstrate that $ exttt{SEM-CTRL}$ allows small pre-trained LLMs to efficiently outperform larger variants and state-of-the-art reasoning models (e.g., o1-preview) while simultaneously guaranteeing solution correctness.

Problem

Research questions and friction points this paper is trying to address.

Ensures syntactic and semantic correctness in LLM outputs.

Integrates token-level MCTS with syntactic and semantic constraints.

Guarantees correct completions without fine-tuning LLMs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-level MCTS guided by constraints

Answer Set Grammars for semantic control

No fine-tuning required for LLMs

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs