Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Current LLMs and multimodal LLMs (MLLMs) blindly rely on lengthy chain-of-thought (CoT) reasoning during inference, leading to reduced efficiency and degraded performance on simple tasks. Method: We propose a perplexity-based adaptive inference routing mechanism: (i) we empirically demonstrate that extending CoT does not necessarily improve accuracy—and can even harm it; (ii) we design an uncertainty-driven dynamic switch that prioritizes direct, concise answers under high confidence, and triggers conditional CoT only when necessary; (iii) we establish a joint evaluation framework spanning multimodal visual question answering (VQA), key information extraction (KIE), and textual reasoning. Results: Across multiple benchmarks, our method achieves an average accuracy gain of 3.2%, reduces inference token usage by 37%, and cuts response latency by 41%, attaining Pareto-optimal improvements in both accuracy and efficiency.

Technology Category

Application Category

📝 Abstract

Recent advancements in reasoning have significantly enhanced the capabilities of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) across diverse tasks. However, excessive reliance on chain-of-thought (CoT) reasoning can impair model performance and brings unnecessarily lengthened outputs, reducing efficiency. Our work reveals that prolonged reasoning does not universally improve accuracy and even degrade performance on simpler tasks. To address this, we propose Certainty-based Adaptive Reasoning (CAR), a novel framework that dynamically switches between short answers and long-form reasoning based on the model perplexity. CAR first generates a short answer and evaluates its perplexity, triggering reasoning only when the model exhibits low confidence (i.e., high perplexity). Experiments across diverse multimodal VQA/KIE benchmarks and text reasoning datasets show that CAR outperforms both short-answer and long-form reasoning approaches, striking an optimal balance between accuracy and efficiency.

Problem

Research questions and friction points this paper is trying to address.

Excessive chain-of-thought reasoning reduces LLM/MLLM efficiency

Prolonged reasoning may degrade performance on simpler tasks

Need for dynamic reasoning to balance accuracy and efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic switching between short and long reasoning

Certainty-based Adaptive Reasoning (CAR) framework

Perplexity evaluation for confidence-based routing

🔎 Similar Papers

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting