BC Protocol: Structured Dual-Expert Dialogue for Eliciting High-Quality Chain-of-Thought Post-Training Data

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study addresses the scarcity of high-quality chain-of-thought data in large language model post-training, where crowdsourced annotations often lack deep reasoning, solo expert generation is prone to blind spots, and reinforcement learning yields only preference signals. To overcome these limitations, the authors propose the BC protocol, which introduces a structured dual-expert dialogue mechanism that pairs domain experts with knowledge engineers to systematically externalize implicit judgments into natural-language reasoning chains. Central to this approach is the novel concept of “calibrated ignorance,” which underpins the principle of “selection over prescription” and informs a participant-matching model based on six-dimensional traits to optimize expert selection. Experimental results on narrative fiction tasks demonstrate that BC-generated reasoning chains significantly outperform those produced by solo experts in reasoning naturalness (mean scores: 4.80 vs. 1.30; p = 2.4 × 10⁻⁸; Cliff’s δ = 1.0).

📝 Abstract

High-quality expert chain-of-thought (CoT) data is one of the core bottlenecks in large language model (LLM) post-training. Existing data production methods each have structural limitations: crowdsourced annotation lacks deep reasoning paths; expert solo writing is constrained by the "expert blind spot" -- experts structurally skip reasoning steps they consider obvious; RLHF only produces preference signals rather than reasoning chains. This paper proposes the BC Protocol -- a structured dual-expert elicitation method for LLM post-training data production. The method carefully pairs a domain expert (crystallized intelligence) with a knowledge engineer (fluid intelligence), systematically externalizing the expert's implicit judgments as natural language reasoning chains. We introduce the Participant Aptitude Model, which defines six participant characteristic dimensions that affect elicitation quality. "Calibrated Ignorance" is an original concept proposed in this paper. We further propose "Selection-over-Prescription" as a methodological principle: for implicit knowledge elicitation tasks, investing quality-control resources in personnel selection yields a higher return than investing the same resources in process design. In a controlled experiment in the narrative fiction domain, we directly compared CoT produced by BC Protocol dual dialogue (Group A, (n=20)) against CoT written independently by the same domain expert (Group B, (n=20)). Three cross-vendor judge models -- GPT-4o, Claude Opus 4.5, and Gemini 2.5 Pro -- conducted blind evaluation across five dimensions (600 ratings total). Results show that the BC Protocol achieves an overwhelming advantage in "naturalness of reasoning process" (Group A mean 4.80 vs. Group B mean 1.30, (p=2.4\times10^{-8}), Cliff's (δ=1.0)).

Problem

Research questions and friction points this paper is trying to address.

chain-of-thought

post-training data

expert blind spot

reasoning chains

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

BC Protocol

Chain-of-Thought

Calibrated Ignorance