🤖 AI Summary
To address the high labor cost and low efficiency of manual annotation in large-scale content labeling, this paper focuses on code documentation analysis and proposes MCHR, a semi-automated annotation framework. Methodologically, MCHR introduces (1) a novel multi-large-language-model (LLM) structured consensus mechanism that enhances robustness of automated labeling through collaborative reasoning and consensus aggregation; and (2) a difficulty-aware human review triggering protocol that dynamically invokes human intervention based on task complexity. The framework supports diverse annotation tasks—from binary classification to open-set labeling—and employs open-set evaluation to rigorously assess generalization capability. Experimental results demonstrate that MCHR achieves stable accuracy of 85.5%–98%, while reducing annotation time by 32%–100% compared to fully manual labeling. This significantly improves the scalability and practicality of annotation systems without compromising reliability.
📝 Abstract
Content annotation at scale remains challenging, requiring substantial human expertise and effort. This paper presents a case study in code documentation analysis, where we explore the balance between automation efficiency and annotation accuracy. We present MCHR (Multi-LLM Consensus with Human Review), a novel semi-automated framework that enhances annotation scalability through the systematic integration of multiple LLMs and targeted human review. Our framework introduces a structured consensus-building mechanism among LLMs and an adaptive review protocol that strategically engages human expertise. Through our case study, we demonstrate that MCHR reduces annotation time by 32% to 100% compared to manual annotation while maintaining high accuracy (85.5% to 98%) across different difficulty levels, from basic binary classification to challenging open-set scenarios.