🤖 AI Summary
Existing large language models (LLMs) predominantly rely on static knowledge retrieval, lacking the capability to actively identify knowledge gaps and formulate high-value questions—critical for dynamic knowledge acquisition in real-world scenarios such as educational tutoring and medical consultation.
Method: This work introduces the first systematic “student-led interactive learning” paradigm, endowing LLMs with three core capabilities: knowledge-gap detection, strategic question generation, and dynamic feedback integration. We propose a DPO-based question-quality optimization framework that supports both cross-model distillation (from small to strong models) and self-distillation of questioning strategies. The framework integrates dynamic interaction modeling, self-skepticism mechanisms, and multi-turn knowledge consolidation prompting.
Contribution/Results: Evaluated on mathematical reasoning and programming benchmarks, our approach achieves absolute Pass@k improvements ≥0.5 over static retrieval baselines. After DPO fine-tuning, smaller models demonstrate significantly enhanced question quality and learning efficiency, validating the paradigm’s scalability and effectiveness.
📝 Abstract
Large Language Models (LLMs) excel at static interactions, where they answer user queries by retrieving knowledge encoded in their parameters. However, in many real-world settings, such as educational tutoring or medical assistance, relevant information is not directly available and must be actively acquired through dynamic interactions. An interactive agent would recognize its own uncertainty, ask targeted questions, and retain new knowledge efficiently. Prior work has primarily explored effective ways for a teacher to instruct the student, where the teacher identifies student gaps and provides guidance. In this work, we shift the focus to the student and investigate effective strategies to actively query the teacher in seeking useful information. Across math and coding benchmarks, where baseline student models begin with near-zero performance, we show that student-led approaches consistently yield absolute Pass@k improvements of at least 0.5 over static baselines. To improve question quality, we train students using Direct Preference Optimization (DPO) with guidance from either self or stronger students. We find that this guided training enables smaller models to learn how to ask better questions, further enhancing learning efficiency.