Learning to Think from Multiple Thinkers

📅 2026-04-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

216K/year
🤖 AI Summary
This work addresses the computational challenge of learning from multiple annotators who provide correct yet stylistically diverse chain-of-thought (CoT) rationales. The study is the first to formally characterize the computational complexity of this setting and introduces an efficient active learning algorithm that overcomes the limitations of passive learning. The proposed method requires only a fixed number of CoT examples per annotator, O(log(1/ε) log log(1/ε)) annotators, and Õ(1/ε) supervisory labels on final answers to achieve ε-accurate learning. Notably, its annotation efficiency is independent of the target accuracy ε, substantially enhancing the scalability of learning under heterogeneous CoT supervision.

Technology Category

Application Category

📝 Abstract
We study learning with Chain-of-Thought (CoT) supervision from multiple thinkers, all of whom provide correct but possibly systematically different solutions, e.g., step-by-step solutions to math problems written by different thinkers, or step-by-step execution traces of different programs solving the same problem. We consider classes that are computationally easy to learn using CoT supervision from a single thinker, but hard to learn with only end-result supervision, i.e., without CoT (Joshi et al. 2025). We establish that, under cryptographic assumptions, learning can be hard from CoT supervision provided by two or a few different thinkers, in passive data-collection settings. On the other hand, we provide a generic computationally efficient active learning algorithm that learns with a small amount of CoT data per thinker that is completely independent of the target accuracy $\varepsilon$, a moderate number of thinkers that scales as $\log \frac{1}{\varepsilon}\log \log \frac{1}{\varepsilon}$, and sufficient passive end-result data that scales as $\frac{1}{\varepsilon}\cdot poly\log\frac{1}{\varepsilon}$.
Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought
multi-thinker learning
supervision
computational learning
active learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought
Multiple Thinkers
Active Learning
Computational Hardness
Supervised Learning
🔎 Similar Papers
No similar papers found.