Upfront Chain-of-Thought: A Cooperative Framework for Chain-of-Thought Compression

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from high computational overhead and latency when employing chain-of-thought (CoT) reasoning, while existing CoT compression methods rely on manual prompt engineering or external compressed datasets—often discarding critical reasoning information. To address this, we propose Upfront CoT (UCoT), a novel framework that introduces a lightweight “compressor” model to automatically learn compact, task-relevant reasoning representations *before* answer generation; these embeddings are then consumed by a larger “executor” model for precise final inference. UCoT enables end-to-end joint training without human-designed prompts or external data curation, integrating thought embedding, collaborative workflow, and reward-driven optimization. On GSM8K, UCoT reduces token consumption of Qwen2.5-7B-Instruct by 50% while improving accuracy by 3.08% over the state of the art—demonstrating simultaneous gains in both inference efficiency and performance.

Technology Category

Application Category

📝 Abstract
Recent developments have enabled advanced reasoning in Large Language Models (LLMs) via long Chain-of-Thought (CoT), while long CoT suffers from high computational costs and significant latency losses owing to the autoregressive nature of generative LLMs. CoT compression aims to improve efficiency in the reasoning process by reducing output length. Previous works trade reasoning efficiency by either laborious discrete prompt designing or the construction of external compressed CoT datasets that sacrifice key reasoning details. In this work, we propose Upfront CoT (UCoT): an efficient reasoning framework with upfront thought embedding to automate CoT compression. UCoT is a cooperative workflow involving a small model (compressor) and a large model (executor). The first stage of UCoT trains compressor to generate upfront thought embeddings rich in reasoning information for the executor, avoiding the drawbacks of manually designed prompts. The second stage optimizes executor to utilize upfront thought embeddings to derive the correct answer with short reasoning, using a reward mechanism. Extensive experiments show that UCoT maintains the powerful reasoning ability of executor while significantly reducing the length of CoT. It is worth mentioning that when applying UCoT to the Qwen2.5-7B-Instruct model, the usage of tokens on GSM8K dataset is reduced by 50%, while the performance is 3.08% higher than that of the state-of-the-art (SOTA) method. The code and dataset are in supplementary material.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs of long Chain-of-Thought reasoning in LLMs
Automating CoT compression without sacrificing key reasoning details
Maintaining reasoning accuracy while significantly reducing token usage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses cooperative small and large model workflow
Trains compressor to generate upfront thought embeddings
Optimizes executor with reward mechanism for short reasoning
🔎 Similar Papers
No similar papers found.
Chengzhengxu Li
Chengzhengxu Li
xianjiaotong university
LLM RL Prompting
X
Xiaoming Liu
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University
Zhaohan Zhang
Zhaohan Zhang
Queen Mary University of London
Artificial Intelligence
S
Shaochu Zhang
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University
S
Shengchao Liu
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University
G
Guoxin Ma
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University
Y
Yu Lan
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University
C
Chao Shen
Faculty of Electronic and Information Engineering, Xi’an Jiaotong University