When Can Human-AI Teams Outperform Individuals? Tight Bounds with Impossibility Guarantees

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This study addresses the persistent challenge that human–AI teams often fail to outperform their best individual member, owing to a lack of theoretical guidance on when complementary advantages can be achieved. By integrating signal detection theory with information theory, the authors derive tight performance bounds for confidence-based aggregation rules, establish a complementarity theorem alongside an impossibility result, and identify a threshold condition on error correlations that determines team superiority. The framework is extended to multiclass settings, revealing a scaling law for the correlation threshold. Theoretical predictions exhibit strong agreement with empirical results on ImageNet-16H and CIFAR-10H (R > 0.91), confirming both the validity of the multiclass threshold scaling and the robustness of the proposed approach.

📝 Abstract

Human-AI teams fail to outperform their best member in 70% of studies, yet no theory specifies when complementarity is achievable. We derive tight bounds for the broad class of confidence-based aggregation rules by integrating signal detection theory with information-theoretic analysis, yielding four results: (1) a complementarity theorem (teams outperform individuals iff error correlation $ρ_{HM} < ρ^*$, with $ρ^* \approx a$ in the symmetric near-chance regime); (2) minimax bounds showing gains scale as $Θ(\sqrt{Δd})$ with metacognitive sensitivity difference; (3) an impossibility result proving no confidence-based aggregation rule achieves complementarity when $ρ_{HM} \geq ρ^*$; and (4) multi-class generalization $ρ^*_K \approx ρ^*/\sqrt{K-1}$. Predictions match observed team accuracy ($R = 0.94$ on ImageNet-16H, $R = 0.91$ on CIFAR-10H) and the multi-class threshold scaling holds on human data ($R = 0.93$, $K = 16$), with robustness under non-Gaussian distributions. The framework explains why complementarity is rare and provides actionable design formulas; results apply to aggregation, not to interactive deliberation that generates novel answers.

Problem

Research questions and friction points this paper is trying to address.

human-AI teams

complementarity

error correlation

confidence-based aggregation

team performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

human-AI collaboration

complementarity

confidence-based aggregation