🤖 AI Summary
Large language models (LLMs) incur high computational overhead and latency in complex reasoning due to reliance on lengthy chain-of-thought (CoT) prompting.
Method: This paper introduces Chain of Drafting (CoD), a novel reasoning paradigm inspired by human cognitive refinement of drafts. CoD guides LLMs to generate minimal yet information-complete intermediate reasoning steps—retaining only essential logical elements—via prompt engineering, without fine-tuning or additional parameters, and with broad compatibility across open- and closed-source LLMs.
Contribution/Results: On diverse reasoning benchmarks, CoD matches or exceeds CoT accuracy while reducing inference token consumption to 7.6%–32% of CoT’s usage, substantially lowering computational cost and response latency. To our knowledge, this is the first work to formalize human draft-based reasoning cognition into a lightweight, deployable LLM inference paradigm.
📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable performance in solving complex reasoning tasks through mechanisms like Chain-of-Thought (CoT) prompting, which emphasizes verbose, step-by-step reasoning. However, humans typically employ a more efficient strategy: drafting concise intermediate thoughts that capture only essential information. In this work, we propose Chain of Draft (CoD), a novel paradigm inspired by human cognitive processes, where LLMs generate minimalistic yet informative intermediate reasoning outputs while solving tasks. By reducing verbosity and focusing on critical insights, CoD matches or surpasses CoT in accuracy while using as little as only 7.6% of the tokens, significantly reducing cost and latency across various reasoning tasks.