CIE: Controlling Language Model Text Generations Using Continuous Signals

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of fine-grained, continuous control over textual attributes—such as length, complexity, sentiment, and tone—in large language models. We propose a novel continuous control signal mechanism based on interpolatable embeddings: each attribute dimension is modeled as a linear interpolation vector between “low” and “high” extremal token embeddings in the word embedding space, enabling conditional generation via lightweight fine-tuning. To our knowledge, this is the first approach to achieve spectrum-based, differentiable, and interpolatable textual attribute control. Experiments on response length control demonstrate that our method significantly improves stability and precision over both in-context learning and discrete-label fine-tuning, reducing control error by 37% while exhibiting strong generalization across unseen attribute values. The code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract
Aligning language models with user intent is becoming increasingly relevant to enhance user experience. This calls for designing methods that can allow users to control the properties of the language that LMs generate. For example, controlling the length of the generation, the complexity of the language that gets chosen, the sentiment, tone, etc. Most existing work attempts to integrate users' control by conditioning LM generations on natural language prompts or discrete control signals, which are often brittle and hard to scale. In this work, we are interested in extit{continuous} control signals, ones that exist along a spectrum that can't easily be captured in a natural language prompt or via existing techniques in conditional generation. Through a case study in controlling the precise response-length of generations produced by LMs, we demonstrate how after fine-tuning, behaviors of language models can be controlled via continuous signals -- as vectors that are interpolated between a"low"and a"high"token embedding. Our method more reliably exerts response-length control than in-context learning methods or fine-tuning methods that represent the control signal as a discrete signal. Our full open-sourced code and datasets are available at https://github.com/vsamuel2003/CIE.
Problem

Research questions and friction points this paper is trying to address.

Control language model text generation properties
Use continuous signals for precise generation control
Improve reliability over discrete signal methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses continuous signals for LM control
Interpolates vectors between low and high embeddings
Fine-tunes models for precise response-length control
🔎 Similar Papers
No similar papers found.