🤖 AI Summary
This work proposes Batch-of-Thought (BoT), a training-free batched joint reasoning framework that addresses the limitation of existing large language model inference systems, which typically process queries in isolation and overlook shared reasoning patterns and consistency constraints across instances. BoT introduces, for the first time during inference, a batch-aware mechanism that leverages a Reflector module within a multi-agent reflective architecture to holistically evaluate related queries, thereby uncovering mutual information inaccessible to isolated reasoning. The framework enables reuse of reasoning templates, error detection, and amortization of computational costs, transcending the conventional single-query paradigm. Experimental results demonstrate that BoT significantly improves accuracy and confidence calibration across three model families and six benchmarks, while reducing inference costs by up to 61%.
📝 Abstract
Current Large Language Model reasoning systems process queries independently, discarding valuable cross-instance signals such as shared reasoning patterns and consistency constraints. We introduce Batch-of-Thought (BoT), a training-free method that processes related queries jointly to enable cross-instance learning. By performing comparative analysis across batches, BoT identifies high-quality reasoning templates, detects errors through consistency checks, and amortizes computational costs. We instantiate BoT within a multi-agent reflection architecture (BoT-R), where a Reflector performs joint evaluation to unlock mutual information gain unavailable in isolated processing. Experiments across three model families and six benchmarks demonstrate that BoT-R consistently improves accuracy and confidence calibration while reducing inference costs by up to 61%. Our theoretical and experimental analysis reveals when and why batch-aware reasoning benefits LLM systems.