🤖 AI Summary
Small language models (SLMs) suffer from low reasoning accuracy, and conventional optimization methods yield limited improvements. Method: This paper proposes Cycle-Consistent Question Answering (CCQA), a novel inference refinement framework that introduces a backward question generation mechanism: given a candidate answer derived from the reasoning path, CCQA reconstructs a question and computes its semantic similarity with the original input; the optimal answer is selected based on cycle consistency—i.e., high similarity between the reconstructed and original questions—without requiring additional training or parameter expansion. Contribution/Results: CCQA is the first systematic application of cycle consistency to SLM reasoning optimization. Implemented via a lightweight Flan-T5-based question generator, it achieves consistent and significant gains across eight distinct SLMs on both mathematical and commonsense reasoning benchmarks, surpassing existing state-of-the-art methods. The approach establishes an efficient, scalable, and training-free paradigm for enhancing reasoning in small language models.
📝 Abstract
Recently, inference-time reasoning strategies have further improved the accuracy of large language models (LLMs), but their effectiveness on smaller models remains unclear. Based on the observation that conventional approaches often fail to improve performance in this context, we propose extbf{C}ycle- extbf{C}onsistency in extbf{Q}uestion extbf{A}nswering (CCQA), a novel reasoning method that can be effectively applied to SLMs. Inspired by cycle consistency, CCQA generates a question from each reasoning path and answer, evaluates each by its similarity to the original question, and then selects the candidate solution with the highest similarity score as the final response. Since conventional SLMs struggle to generate accurate questions from their own reasoning paths and answers, we employ a lightweight Flan-T5 model specialized for question generation to support this process efficiently. From the experimental results, it is verified that CCQA consistently outperforms existing state-of-the-art (SOTA) methods across eight models on mathematical and commonsense reasoning benchmarks. Furthermore, our method establishes a new practical baseline for efficient reasoning in SLMs. Source code can be found at https://github.com/scai-research/ccqa_official.