Efficient Reasoning via Thought Compression for Language Segmentation

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost and verbose outputs of chain-of-thought (CoT) reasoning in language-guided segmentation, which hinder practical deployment. The authors propose WISE, a novel framework that integrates reasoning compression directly into training through a “think twice” paradigm: the model first generates a concise rationale, then produces both the answer and a detailed explanation. By combining autoregressive modeling with a self-distillation objective, WISE compels the model to internalize complex reasoning into compact summaries. During inference, only the concise rationale is retained, and a streamlined prompting strategy, WISE-S, is introduced to mitigate distribution shift. Evaluated on ReasonSeg, the method achieves state-of-the-art zero-shot performance with 58.3 cIoU while reducing average reasoning length from 112 to 23 tokens—nearly a fivefold compression.
📝 Abstract
Chain-of-thought (CoT) reasoning has significantly improved the performance of large multimodal models in language-guided segmentation, yet its prohibitive computational cost, stemming from generating verbose rationales, limits real-world applicability. We introduce WISE (Wisdom from Internal Self-Exploration), a novel paradigm for efficient reasoning guided by the principle of \textit{thinking twice -- once for learning, once for speed}. WISE trains a model to generate a structured sequence: a concise rationale, the final answer, and then a detailed explanation. By placing the concise rationale first, our method leverages autoregressive conditioning to enforce that the concise rationale acts as a sufficient summary for generating the detailed explanation. This structure is reinforced by a self-distillation objective that jointly rewards semantic fidelity and conciseness, compelling the model to internalize its detailed reasoning into a compact form. At inference, the detailed explanation is omitted. To address the resulting conditional distribution shift, our inference strategy, WISE-S, employs a simple prompting technique that injects a brevity-focused instruction into the user's query. This final adjustment facilitates the robust activation of the learned concise policy, unlocking the full benefits of our framework. Extensive experiments show that WISE-S achieves state-of-the-art zero-shot performance on the ReasonSeg benchmark with 58.3 cIoU, while reducing the average reasoning length by nearly \textbf{5$\times$} -- from 112 to just 23 tokens. Code is available at \href{https://github.com/mrazhou/WISE}{WISE}.
Problem

Research questions and friction points this paper is trying to address.

Chain-of-thought reasoning
language-guided segmentation
computational cost
reasoning efficiency
multimodal models
Innovation

Methods, ideas, or system contributions that make the work stand out.

thought compression
efficient reasoning
chain-of-thought
self-distillation
language-guided segmentation
🔎 Similar Papers
No similar papers found.