Plan-and-Write: Structure-Guided Length Control for LLMs without Model Retraining

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of precise output length control in large language models (LLMs), where existing methods often require fine-tuning or architectural modifications. We propose a prompt engineering framework that operates entirely at inference time—without model adaptation—by integrating task decomposition, chain-of-thought reasoning, and dynamic token counting into a structured prompt. This enables the model to autonomously plan its response and monitor cumulative token count in real time. To our knowledge, this is the first approach to significantly improve adherence to short-to-medium length constraints (e.g., 50–300 tokens) using prompts alone. Evaluated across six state-of-the-art LLMs, it achieves an average 37.6% improvement in length fidelity while preserving or enhancing generation quality (e.g., coherence, relevance). The method is plug-and-play, incurs negligible deployment overhead, and is particularly suited for applications demanding strict control over response conciseness or comprehensiveness.

Technology Category

Application Category

📝 Abstract
Length control in Large Language Models (LLMs) is a crucial but under-addressed challenge, with applications ranging from voice interfaces requiring concise responses to research summaries needing comprehensive outputs. Current approaches to length control, including Regularized DPO, Length-Instruction Fine Tuning, and tool-augmented methods, typically require expensive model retraining or complex inference-time tooling. This paper presents a prompt engineering methodology that enables precise length control without model retraining. Our structure-guided approach implements deliberate planning and word counting mechanisms within the prompt, encouraging the model to carefully track and adhere to specified length constraints. Comprehensive evaluations across six state-of-the-art LLMs demonstrate that our method significantly improves length fidelity for several models compared to standard prompting when applied to document summarization tasks, particularly for shorter-to-medium length constraints. The proposed technique shows varying benefits across different model architectures, with some models demonstrating up to 37.6% improvement in length adherence. Quality evaluations further reveal that our approach maintains or enhances overall output quality compared to standard prompting techniques. Our approach provides an immediately deployable solution for applications requiring precise length control, particularly valuable for production environments where model retraining is impractical or cost-prohibitive.
Problem

Research questions and friction points this paper is trying to address.

Enabling precise length control in LLMs without retraining
Addressing length constraints for summarization and voice interfaces
Improving length adherence while maintaining output quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prompt engineering enables length control without retraining
Structure-guided planning with word counting mechanisms
Immediately deployable solution maintaining output quality
A
Adewale Akinfaderin
Amazon Web Services, Seattle, WA, USA
Shreyas Subramanian
Shreyas Subramanian
Amazon Web Services
Generative AIArtificial IntelligenceDeep LearningReinforcement LearningFoundation Models
A
Akarsha Sehwag
Amazon Web Services, Seattle, WA, USA