Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

Natural-language chain-of-thought (N-CoT) and programmatic chain-of-thought (P-CoT) approaches for mathematical reasoning suffer from unidirectional enhancement—neither fully leverages the complementary strengths of the other. Method: We propose Parrot, the first framework enabling bidirectional mutual enhancement between N-CoT and P-CoT. It comprises a three-stage subtask design, a hybrid training strategy, and an N-CoT–guided reward mechanism to mitigate sparse-reward challenges. Built upon LLaMA2 and CodeLlama architectures, Parrot integrates instruction tuning, semantic transfer, and multi-task joint optimization. Contribution/Results: On MathQA, Parrot improves N-CoT accuracy by 21.87 and 21.48 percentage points over computationally intensive RL baselines, significantly outperforming prior methods. Empirical results demonstrate that dual-path collaborative modeling yields substantial gains in mathematical reasoning capability.

Technology Category

Application Category

📝 Abstract

Natural language chain-of-thought (N-CoT) and Program chain-of-thought (P-CoT) have emerged as two primary paradigms for large language models (LLMs) to solve mathematical reasoning problems. Current research typically endeavors to achieve unidirectional enhancement: P-CoT enhanced N-CoT or N-CoT enhanced P-CoT. In this paper, we seek to fully unleash the two paradigms' strengths for mutual enhancement and ultimately achieve simultaneous improvements. We conduct a detailed analysis of the error types across two paradigms, based on which we propose Parrot, a novel training pipeline for mathematical problems: 1) Three target-designed subtasks integrate sequential P-CoT and N-CoT generation. 2) A subtask hybrid training strategy to facilitate natural language semantic transferability. 3) The converted N-CoT auxiliary reward is designed to alleviate the sparse rewards in P-CoT optimization. Extensive experiments demonstrate that Parrot significantly enhances both the performance of N-CoT and P-CoT, especially on N-CoT. Using Parrot SFT, the N-CoT performance of LLaMA2 and CodeLLaMA achieve gains of +21.87 and +21.48 on MathQA over the RL baseline, which is resource-intensive.

Problem

Research questions and friction points this paper is trying to address.

Enhances both program and natural language chain-of-thought reasoning

Proposes mutual enhancement between program and natural language CoT paradigms

Addresses sparse rewards in program CoT optimization via training pipeline

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training pipeline integrates sequential program and natural language CoT

Subtask hybrid training enhances natural language semantic transferability

Converted natural language CoT auxiliary reward reduces sparse rewards

🔎 Similar Papers

Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?