Learning from All: Concept Alignment for Autonomous Distillation from Multiple Drifting MLLMs

📅 2025-10-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address concept drift—i.e., dynamic shifts in teacher reasoning distributions across multimodal large language models (MLLMs) that propagate bias and degrade student performance—this paper proposes Autonomous Preference Optimization (APO). APO establishes, for the first time, a theoretical linkage between concept drift and knowledge distillation, introducing a three-stage adaptive alignment paradigm: “learn–compare–critique.” It performs self-distillation via next-token prediction over multi-stream reasoning trajectories and incorporates a critical reflection mechanism for dynamic concept calibration. Empirically, APO achieves significant gains in consistency, robustness, and generalization over state-of-the-art methods. To support reproducible research, we publicly release CXR-MAX, a large-scale X-ray reasoning dataset comprising 170K distilled reasoning trajectories.

Technology Category

Application Category

📝 Abstract
This paper identifies a critical yet underexplored challenge in distilling from multimodal large language models (MLLMs): the reasoning trajectories generated by multiple drifting teachers exhibit concept drift, whereby their reasoning distributions evolve unpredictably and transmit biases to the student model, ultimately compromising its performance. To tackle this issue, we pioneer a theoretical connection between concept drift and knowledge distillation, casting the non-stationary reasoning dynamics from multiple MLLM teachers as next-token prediction of multi-stream reasoning trajectories.Guided by concept drift, we introduce the "learn, compare, critique" paradigm, culminating in autonomous preference optimization (APO). Under the active guidance of the teachers, the student model first learns and self-distils preferred thinking by comparing multiple teachers. It then engages in critical reflection over the drifting inference from teachers, performing concept alignment through APO, ultimately yielding a robust, consistent, and generalizable model.Extensive experiments demonstrate our superior performance of consistency, robustness and generalization within knowledge distillation. Besides, we also contributed a large-scale dataset, CXR-MAX (Multi-teachers Alignment X-rays), comprising 170,982 distilled reasoning trajectories derived from publicly accessible MLLMs based on MIMIC-CXR. Our code and data are public at: https://anonymous.4open.science/r/Autonomous-Distillation/.
Problem

Research questions and friction points this paper is trying to address.

Addresses concept drift in multimodal large language model distillation
Aligns reasoning trajectories from multiple drifting teacher models
Enhances student model robustness through autonomous preference optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns concepts from multiple drifting MLLM teachers
Uses autonomous preference optimization for critical reflection
Self-distills preferred thinking by comparing teacher trajectories
🔎 Similar Papers
No similar papers found.
Xiaoyu Yang
Xiaoyu Yang
University of Cambridge
Speech recognitionmachine learning
J
Jie Lu
Australian Artificial Intelligence Institute (AAII), Faulty of Engineering and Information Technology, University of Technology Sydney, Australia.
E
En Yu
Australian Artificial Intelligence Institute (AAII), Faulty of Engineering and Information Technology, University of Technology Sydney, Australia.