Recursive Think-Answer Process for LLMs and VLMs

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the vulnerability of large language models (LLMs) and vision-language models (VLMs) to output errors in single-pass inference, even when self-reflection cues are present, due to the absence of an effective error-correction mechanism. To overcome this limitation, the authors propose the Recursive Think-and-Answer Process (R-TAP), which employs an iterative reasoning loop coupled with a confidence assessment mechanism to iteratively refine model outputs. R-TAP incorporates a confidence generator and a dual-reward scheme—comprising recursive confidence-improvement rewards and final-answer confidence rewards—to substantially suppress unproductive self-reflection and enhance both reasoning stability and accuracy. Experimental results demonstrate that R-TAP consistently outperforms single-pass inference approaches across LLM and VLM tasks, achieving lower error rates, higher efficiency, and greater robustness.

Technology Category

Application Category

📝 Abstract

Think-Answer reasoners such as DeepSeek-R1 have made notable progress by leveraging interpretable internal reasoning. However, despite the frequent presence of self-reflective cues like "Oops!", they remain vulnerable to output errors during single-pass inference. To address this limitation, we propose an efficient Recursive Think-Answer Process (R-TAP) that enables models to engage in iterative reasoning cycles and generate more accurate answers, going beyond conventional single-pass approaches. Central to this approach is a confidence generator that evaluates the certainty of model responses and guides subsequent improvements. By incorporating two complementary rewards-Recursively Confidence Increase Reward and Final Answer Confidence Reward-we show that R-TAP-enhanced models consistently outperform conventional single-pass methods for both large language models (LLMs) and vision-language models (VLMs). Moreover, by analyzing the frequency of "Oops"-like expressions in model responses, we find that R-TAP-applied models exhibit significantly fewer self-reflective patterns, resulting in more stable and faster inference-time reasoning. We hope R-TAP pave the way evolving into efficient and elaborated methods to refine the reasoning processes of future AI.

Problem

Research questions and friction points this paper is trying to address.

Think-Answer reasoning

output errors

single-pass inference

self-reflective cues

reasoning stability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Recursive Think-Answer Process

confidence-based reasoning

iterative refinement