🤖 AI Summary
This study investigates whether improvements in AI models’ reasoning capabilities naturally enhance their effectiveness as teachers—specifically, their ability to convey understandable and transferable knowledge to humans in human-AI collaboration. Method: We propose KITE, a novel evaluation framework that quantifies the causal impact of model explanations on humans’ subsequent independent problem-solving performance via a two-stage behavioral experiment (N=118). Contribution/Results: We provide the first systematic definition and empirical measurement of human-AI knowledge transfer efficacy, revealing only weak correlation between standard model benchmark performance and actual knowledge transfer—evidencing a “knowledge-rich but explanation-poor” phenomenon. We identify key behavioral strategy factors governing successful knowledge transfer and demonstrate that explainability and pedagogical alignment must be explicitly optimized. To support reproducibility and further research, we open-source the KITE toolkit—including implementation code, annotated datasets, and standardized evaluation protocols.
📝 Abstract
Recent advancements in AI reasoning have driven substantial improvements across diverse tasks. A critical open question is whether these improvements also yields better knowledge transfer: the ability of models to communicate reasoning in ways humans can understand, apply, and learn from. To investigate this, we introduce Knowledge Integration and Transfer Evaluation (KITE), a conceptual and experimental framework for Human-AI knowledge transfer capabilities and conduct the first large-scale human study (N=118) explicitly designed to measure it. In our two-phase setup, humans first ideate with an AI on problem-solving strategies, then independently implement solutions, isolating model explanations' influence on human understanding. Our findings reveal that although model benchmark performance correlates with collaborative outcomes, this relationship is notably inconsistent, featuring significant outliers, indicating that knowledge transfer requires dedicated optimization. Our analysis identifies behavioral and strategic factors mediating successful knowledge transfer. We release our code, dataset, and evaluation framework to support future work on communicatively aligned models.