Learnable Instance Attention Filtering for Adaptive Detector Distillation

📅 2026-03-27

📈 Citations: 0

✨ Influential: 0

career value

146K/year

🤖 AI Summary

Existing knowledge distillation methods for object detection typically treat all instances equally and employ heuristic or teacher-only attention filtering mechanisms, overlooking the student model’s learning dynamics and instance-level variations. This work proposes a learnable instance-aware attention filtering framework that, for the first time, integrates the student model’s dynamic learning state into an instance-level attention mechanism. By introducing a trainable instance selector, the method adaptively reweights the importance of each instance, enabling end-to-end adaptive distillation. This approach departs from conventional static or teacher-dominated filtering paradigms and achieves significant performance gains on KITTI and COCO benchmarks: a GFL ResNet-50 student model improves by 2% mAP without additional computational overhead, outperforming current state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

As deep vision models grow increasingly complex to achieve higher performance, deployment efficiency has become a critical concern. Knowledge distillation (KD) mitigates this issue by transferring knowledge from large teacher models to compact student models. While many feature-based KD methods rely on spatial filtering to guide distillation, they typically treat all object instances uniformly, ignoring instance-level variability. Moreover, existing attention filtering mechanisms are typically heuristic or teacher-driven, rather than learned with the student. To address these limitations, we propose Learnable Instance Attention Filtering for Adaptive Detector Distillation (LIAF-KD), a novel framework that introduces learnable instance selectors to dynamically evaluate and reweight instance importance during distillation. Notably, the student contributes to this process based on its evolving learning state. Experiments on the KITTI and COCO datasets demonstrate consistent improvements, with a 2% gain on a GFL ResNet-50 student without added complexity, outperforming state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

knowledge distillation

object detection

instance-level variability

attention filtering

student-teacher learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

learnable instance attention

adaptive distillation

knowledge distillation