Fuel Gauge: Estimating Chain-of-Thought Length Ahead of Time in Large Multimodal Models

๐Ÿ“… 2026-03-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the unpredictable reasoning length of Chain-of-Thought (CoT) in large-scale multimodal models, which leads to inefficient resource utilization and degraded accuracy. The authors propose Fuel Gauge, a novel method that enables prior prediction of CoT length for the first time. By uncovering hidden โ€œfuelโ€ signals independent of specific input samples, Fuel Gauge models CoT behavior and extracts key parameters to estimate required reasoning depth. This approach facilitates system-level optimizations such as KV cache pre-allocation and CoT modulation. Evaluated across multiple multimodal question-answering benchmarks, Fuel Gauge demonstrates substantial efficiency gains: it reduces prediction error by 50% on GPQA-Diamond and decreases memory allocation frequency by a factor of 13.37.

Technology Category

Application Category

๐Ÿ“ Abstract
Reasoning Large Multi-modality Models (LMMs) have become the de facto choice for many applications. However, these models rely on a Chain-of-Thought (CoT) process that is lengthy and unpredictable at runtime, often resulting in inefficient use of computational resources (due to memory fragmentation) and sub-optimal accuracy (due to under- and over-thinking). We observe empirically that the CoT process follows a very simple form, whose behavior is independent of the specific generated samples. This suggests that the CoT length can be estimated ahead of time based on a hidden parameter representing the amount of"fuel"available to support the reasoning process. Based on this insight, we propose Fuel Gauge, the first method which extracts this hidden signal and predicts CoT length ahead of time. We demonstrate the utility on the Fuel Gauge on two downstream tasks: predictive KV cache allocation, which addresses memory fragmentation in LMM serving systems, and CoT length modulation, which mitigates under-thinking and over-thinking. Extensive experiments on LMMs across text-only, image-text, and video-text question answering benchmarks demonstrate the effectiveness, generalizability, and practical value of our Fuel Gauge. For example, on the GPQA-Diamond benchmark, our Fuel Gauge achieves less than half the CoT length prediction error compared to the baseline; this translates into a 13.37x reduction in the memory allocation frequency.
Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought
Large Multimodal Models
reasoning efficiency
memory fragmentation
CoT length prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought prediction
Large Multimodal Models
KV cache allocation
reasoning efficiency
Fuel Gauge
๐Ÿ”Ž Similar Papers
No similar papers found.