Semantic Probabilistic Control of Language Models

📅 2025-05-04

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

To address the challenge of enforcing non-lexical semantic constraints—such as toxicity avoidance, sentiment alignment, and topic consistency—in language model generation, this paper proposes a differentiable semantic control method grounded in validator gradients. The core innovation lies in constructing a local semantic similarity distribution via sequence-level validator gradients, modeling constraint satisfaction probability through expected sentence embeddings, and performing gradient-guided token distribution reweighting—thereby eliminating inefficient sampling. The approach is efficient (requiring neither reinforcement learning nor resampling), fully differentiable, and highly accurate. Empirical evaluation across toxicity mitigation, sentiment control, and topic preservation tasks demonstrates constraint satisfaction rates exceeding 95%, while preserving generation quality—including fluency, diversity, and faithfulness—without degradation.

Technology Category

Application Category

📝 Abstract

Semantic control entails steering LM generations towards satisfying subtle non-lexical constraints, e.g., toxicity, sentiment, or politeness, attributes that can be captured by a sequence-level verifier. It can thus be viewed as sampling from the LM distribution conditioned on the target attribute, a computationally intractable problem due to the non-decomposable nature of the verifier. Existing approaches to LM control either only deal with syntactic constraints which cannot capture the aforementioned attributes, or rely on sampling to explore the conditional LM distribution, an ineffective estimator for low-probability events. In this work, we leverage a verifier's gradient information to efficiently reason over all generations that satisfy the target attribute, enabling precise steering of LM generations by reweighing the next-token distribution. Starting from an initial sample, we create a local LM distribution favoring semantically similar sentences. This approximation enables the tractable computation of an expected sentence embedding. We use this expected embedding, informed by the verifier's evaluation at the initial sample, to estimate the probability of satisfying the constraint, which directly informs the update to the next-token distribution. We evaluated the effectiveness of our approach in controlling the toxicity, sentiment, and topic-adherence of LMs yielding generations satisfying the constraint with high probability (>95%) without degrading their quality.

Problem

Research questions and friction points this paper is trying to address.

Steering LMs to meet non-lexical constraints like toxicity or sentiment

Overcoming intractability of sampling from conditional LM distributions

Improving control of LM generations without quality degradation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses verifier's gradient for semantic control

Reweights next-token distribution precisely

Computes expected embedding for constraint satisfaction

🔎 Similar Papers

No similar papers found.