🤖 AI Summary
This work addresses the limited capacity of large language models (LLMs) to perform structured, multi-step reasoning in complex mathematical and creative writing tasks, particularly their difficulty in exploring alternative reasoning paths and transferring knowledge across problems. To overcome this, we propose ReTreVal, a novel framework that integrates Tree-of-Thoughts reasoning, self-optimization, LLM-generated critical scoring, and persistent reflective memory to construct a structured reasoning tree with dual verification mechanisms. The framework supports dynamic pruning via top-k retention and adaptive reasoning depth. Experimental results on 500 mathematical and creative writing tasks demonstrate that ReTreVal, built upon Qwen 2.5 7B, significantly outperforms ReAct, Reflexion, and Self-Refine, substantially enhancing both the quality of exploratory reasoning and the efficiency of cross-task knowledge reuse.
📝 Abstract
Multi-step reasoning remains a key challenge for Large Language Models (LLMs), particularly in complex domains such as mathematics and creative writing. While recent approaches including ReAct, Reflexion, and Self-Refine improve reasoning through iterative refinement and reflection, they often lack structured exploration of alternative solution paths and persistent learning across problems. We propose ReTreVal (Reasoning Tree with Validation), a hybrid framework that integrates Tree-of-Thoughts exploration, self-refinement, LLM-based critique scoring, and reflexion memory to enable bounded and validated multi-step reasoning. ReTreVal constructs a structured reasoning tree with adaptive depth based on problem complexity, where each node undergoes iterative self-critique and refinement guided by explicit LLM-generated feedback. A dual validation mechanism evaluates reasoning quality, coherence, and correctness at each node while persistently storing insights from successful reasoning paths and failure patterns in a reflexion memory buffer, enabling cross-problem learning. Critique-based pruning retains only the top-k highest-scoring nodes at each level, controlling computational cost while preserving high-quality solution paths. We evaluate ReTreVal against ReAct, Reflexion, and Self-Refine across 500 mathematical problems and creative writing tasks using Qwen 2.5 7B as the underlying LLM, and demonstrate that ReTreVal consistently outperforms existing methods through its combination of structured exploration, critique-driven refinement, and cross-problem memory, making it particularly effective for tasks requiring exploratory reasoning, rigorous verification, and knowledge transfer.