🤖 AI Summary
This work addresses the challenge of precise output length control in large language models (LLMs), where existing methods often require fine-tuning or architectural modifications. We propose a prompt engineering framework that operates entirely at inference time—without model adaptation—by integrating task decomposition, chain-of-thought reasoning, and dynamic token counting into a structured prompt. This enables the model to autonomously plan its response and monitor cumulative token count in real time. To our knowledge, this is the first approach to significantly improve adherence to short-to-medium length constraints (e.g., 50–300 tokens) using prompts alone. Evaluated across six state-of-the-art LLMs, it achieves an average 37.6% improvement in length fidelity while preserving or enhancing generation quality (e.g., coherence, relevance). The method is plug-and-play, incurs negligible deployment overhead, and is particularly suited for applications demanding strict control over response conciseness or comprehensiveness.
📝 Abstract
Length control in Large Language Models (LLMs) is a crucial but under-addressed challenge, with applications ranging from voice interfaces requiring concise responses to research summaries needing comprehensive outputs. Current approaches to length control, including Regularized DPO, Length-Instruction Fine Tuning, and tool-augmented methods, typically require expensive model retraining or complex inference-time tooling. This paper presents a prompt engineering methodology that enables precise length control without model retraining. Our structure-guided approach implements deliberate planning and word counting mechanisms within the prompt, encouraging the model to carefully track and adhere to specified length constraints. Comprehensive evaluations across six state-of-the-art LLMs demonstrate that our method significantly improves length fidelity for several models compared to standard prompting when applied to document summarization tasks, particularly for shorter-to-medium length constraints. The proposed technique shows varying benefits across different model architectures, with some models demonstrating up to 37.6% improvement in length adherence. Quality evaluations further reveal that our approach maintains or enhances overall output quality compared to standard prompting techniques. Our approach provides an immediately deployable solution for applications requiring precise length control, particularly valuable for production environments where model retraining is impractical or cost-prohibitive.