AVSD: Adaptive-View Self-Distillation by Balancing Consensus and Teacher-Specific Privileged Signals

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the challenge in self-distillation where teachers often rely on privileged information inaccessible to students, with the optimal type of such information varying across tasks and thus difficult to standardize. The authors propose AVSD, a novel method that jointly tackles the selection and asymmetry of multi-view privileged information for the first time. AVSD decomposes inputs into cross-view consensus signals and view-specific residual signals, then dynamically fuses them in a consensus-guided, aligned, and proportion-controlled manner to generate more robust token-level supervision. Evaluated on mathematical reasoning and code generation benchmarks, AVSD significantly outperforms single-view self-distillation and GRPO, achieving average improvements of 3.1% and 2.2% in Avg@8 on Qwen3-8B and Qwen3-4B, respectively, and a 2.4% average gain on code-related tasks.

📝 Abstract

Self-distillation enables language models to learn on-policy from their own trajectories by using the same model as both student and teacher, with the teacher being conditioned on privileged information unavailable to the student. Such information can come in different types or views, such as solutions, demonstrations, feedback, or final answers. This setup provides dense token-level feedback without relying on a separate external model, but creates a fundamental asymmetry: the teacher may rely on view-specific information that the student cannot access at inference time. Moreover, the best type of privileged information is often task-dependent, making it difficult to choose a single teacher view. In this work, we address both these challenges jointly by introducing AVSD (Adaptive-View Self-Distillation), a novel method of self-distillation with multiple privileged-information views, which reconstructs token-level supervision by separating stable cross-view consensus from view-specific residual signals. AVSD identifies the consensus signal shared across views, which provides a reliable update direction, and then selectively adds the view-specific residual signal to adjust the update magnitude when it both aligns with the consensus direction and remains proportionate to the consensus signal. Experiments on math competition benchmarks (AIME24, AIME25, and HMMT25) show that AVSD consistently outperforms both single-view self-distillation baselines and GRPO, achieving average Avg@8 gains of 3.1% and 2.2% over the strongest baselines on Qwen3-8B and Qwen3-4B, respectively. Moreover, on code-generation benchmarks (Codeforces, LiveCodeBench v6) using Qwen3-8B, AVSD outperforms the single-view self-distillation baseline by 2.4% on average.

Problem

Research questions and friction points this paper is trying to address.

self-distillation

privileged information

view asymmetry

multi-view learning

token-level supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-distillation

privileged information

multi-view learning