🤖 AI Summary
Large language models (LLMs) lack interpretability and self-correction capability in zero-shot reasoning. Method: This paper proposes a novel zero-shot verification-guided reasoning paradigm. It introduces the COT STEP decomposition prompt to automatically decompose complex reasoning into atomic steps, and designs two types of zero-shot self-verification prompts for independent step-level scoring and global consistency checking. Without fine-tuning verifiers, human annotations, or exemplars, the approach leverages prompt engineering alone to drive LLMs to generate reasoning chains, perform stepwise verification, and conduct verification-guided resampling or reranking. Results: On multiple mathematical and commonsense reasoning benchmarks, the method achieves an average 11.3% absolute improvement in answer accuracy; verifier classification accuracy reaches 72–89%, substantially outperforming standard zero-shot chain-of-thought baselines.
📝 Abstract
Previous works have demonstrated the effectiveness of Chain-of-Thought (COT) prompts and verifiers in guiding Large Language Models (LLMs) through the space of reasoning. However, most such studies either use a fine-tuned verifier or rely on manually handcrafted few-shot examples. In contrast, in this paper, we focus on LLM-based self-verification of self-generated reasoning steps via COT prompts in a completely zero-shot regime. To explore this setting, we design a new zero-shot prompt, which we call COT STEP, to aid zero-shot decomposition of reasoning steps and design two new zero-shot prompts for LLM-based verifiers. We evaluate the verifiers' ability to classify the correctness of reasoning chains and explore different ways to use verifier scores in guiding reasoning for various mathematical and commonsense reasoning tasks with different LLMs.