Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Natural-language chain-of-thought (N-CoT) and programmatic chain-of-thought (P-CoT) approaches for mathematical reasoning suffer from unidirectional enhancement—neither fully leverages the complementary strengths of the other. Method: We propose Parrot, the first framework enabling bidirectional mutual enhancement between N-CoT and P-CoT. It comprises a three-stage subtask design, a hybrid training strategy, and an N-CoT–guided reward mechanism to mitigate sparse-reward challenges. Built upon LLaMA2 and CodeLlama architectures, Parrot integrates instruction tuning, semantic transfer, and multi-task joint optimization. Contribution/Results: On MathQA, Parrot improves N-CoT accuracy by 21.87 and 21.48 percentage points over computationally intensive RL baselines, significantly outperforming prior methods. Empirical results demonstrate that dual-path collaborative modeling yields substantial gains in mathematical reasoning capability.

Technology Category

Application Category

📝 Abstract
Natural language chain-of-thought (N-CoT) and Program chain-of-thought (P-CoT) have emerged as two primary paradigms for large language models (LLMs) to solve mathematical reasoning problems. Current research typically endeavors to achieve unidirectional enhancement: P-CoT enhanced N-CoT or N-CoT enhanced P-CoT. In this paper, we seek to fully unleash the two paradigms' strengths for mutual enhancement and ultimately achieve simultaneous improvements. We conduct a detailed analysis of the error types across two paradigms, based on which we propose Parrot, a novel training pipeline for mathematical problems: 1) Three target-designed subtasks integrate sequential P-CoT and N-CoT generation. 2) A subtask hybrid training strategy to facilitate natural language semantic transferability. 3) The converted N-CoT auxiliary reward is designed to alleviate the sparse rewards in P-CoT optimization. Extensive experiments demonstrate that Parrot significantly enhances both the performance of N-CoT and P-CoT, especially on N-CoT. Using Parrot SFT, the N-CoT performance of LLaMA2 and CodeLLaMA achieve gains of +21.87 and +21.48 on MathQA over the RL baseline, which is resource-intensive.
Problem

Research questions and friction points this paper is trying to address.

Enhances both program and natural language chain-of-thought reasoning
Proposes mutual enhancement between program and natural language CoT paradigms
Addresses sparse rewards in program CoT optimization via training pipeline
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training pipeline integrates sequential program and natural language CoT
Subtask hybrid training enhances natural language semantic transferability
Converted natural language CoT auxiliary reward reduces sparse rewards
🔎 Similar Papers
No similar papers found.
Senjie Jin
Senjie Jin
Fudan University
natural language processing
L
Lu Chen
College of Computer Science and Artificial Intelligence, Fudan University
Zhiheng Xi
Zhiheng Xi
Fudan University
LLM ReasoningLLM-based Agents
Y
Yuhui Wang
College of Computer Science and Artificial Intelligence, Fudan University
S
Sirui Song
College of Computer Science and Artificial Intelligence, Fudan University
Y
Yuhao Zhou
College of Computer Science and Artificial Intelligence, Fudan University
Xinbo Zhang
Xinbo Zhang
ByteDance
P
Peng Sun
ByteDance Research
H
Hong Lu
College of Computer Science and Artificial Intelligence, Fudan University; Shanghai Key Laboratory of Intelligent Information Processing
T
Tao Gui
Shanghai Innovation Institute; Shanghai Key Laboratory of Intelligent Information Processing
Q
Qi Zhang
College of Computer Science and Artificial Intelligence, Fudan University; Shanghai Key Laboratory of Intelligent Information Processing
X
Xuanjing Huang
College of Computer Science and Artificial Intelligence, Fudan University; Shanghai Key Laboratory of Intelligent Information Processing