Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This study investigates whether small-to-medium open-source large language models (LLMs) can match the mathematical reasoning performance of powerful closed-source models (e.g., GPT-4). To this end, we propose Mixture of Opinions (MoO), a post-training method that leverages multiple lightweight auxiliary LLMs to generate chain-of-thought (CoT) reasoning traces and answers; these multi-perspective outputs are distilled and fused to enhance the primary model’s reasoning capability. Our key contribution is the first empirical demonstration that structured opinions from weaker models can substantially improve stronger models’ performance—without requiring additional human annotations or reinforcement learning, thus enabling low-cost, high-efficiency reasoning enhancement. On benchmarks including GSM8K, MoO achieves an average accuracy gain of 5%, significantly outperforming standard supervised fine-tuning (SFT), few-shot prompting, and model-of-agents (MoA) ensembling.

Technology Category

Application Category

📝 Abstract

Recent advances in Large Language Models (LLMs) have raised interest in their formal reasoning capabilities, particularly in mathematics. While closed LLMs like GPT-4 perform well on mathematical benchmarks, e.g., GSM8K, it remains unclear whether small to medium-sized open LLMs can achieve similar performance, questioning their reliability. To close this gap, we propose a post-training approach leveraging a mixture of opinions (MoO) from weaker ancillary LLMs to enhance a (relatively) stronger LLM's reasoning. For that, each post-training sample is augmented with Chain-of-Thought (CoT) reasoning steps and answers from ancillary LLMs, enabling the main LLM to learn from diverse perspectives. We compare MoO with standard supervised fine-tuning (SFT), few-shot prompting, and the Mixture of Agents (MoA) method on mathematical reasoning benchmarks. Our results show that incorporating weaker LLMs' opinions improves mathematical reasoning by an average of 5%, highlighting the value of diverse perspectives in reasoning tasks.

Problem

Research questions and friction points this paper is trying to address.

Enhance LLM reasoning with weaker LLMs' opinions.

Improve mathematical reasoning using diverse perspectives.

Compare MoO with SFT and MoA methods.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Opinions (MoO)

Chain-of-Thought (CoT) reasoning

Post-training sample augmentation

🔎 Similar Papers

No similar papers found.