Hard Labels In! Rethinking the Role of Hard Labels in Mitigating Local Semantic Drift

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Soft labels in knowledge distillation and dataset distillation suffer from localized semantic drift due to insufficient cropping: visually similar cropped regions induce soft predictions that deviate from the image’s global semantics, causing train-test distribution shift. This work presents the first theoretical analysis of this drift mechanism and identifies hard labels—as content-agnostic semantic anchors—as effective correctors of soft-label bias. Building on this insight, we propose HALD, a novel paradigm that jointly leverages soft and hard label supervision to co-optimize knowledge and dataset distillation. On ImageNet-1K, HALD achieves 42.7% top-1 accuracy using only 285M distilled soft labels—outperforming LPLD by +9.0%—while demonstrating significantly improved cross-benchmark generalization.

Technology Category

Application Category

📝 Abstract
Soft labels generated by teacher models have become a dominant paradigm for knowledge transfer and recent large-scale dataset distillation such as SRe2L, RDED, LPLD, offering richer supervision than conventional hard labels. However, we observe that when only a limited number of crops per image are used, soft labels are prone to local semantic drift: a crop may visually resemble another class, causing its soft embedding to deviate from the ground-truth semantics of the original image. This mismatch between local visual content and global semantic meaning introduces systematic errors and distribution misalignment between training and testing. In this work, we revisit the overlooked role of hard labels and show that, when appropriately integrated, they provide a powerful content-agnostic anchor to calibrate semantic drift. We theoretically characterize the emergence of drift under few soft-label supervision and demonstrate that hybridizing soft and hard labels restores alignment between visual content and semantic supervision. Building on this insight, we propose a new training paradigm, Hard Label for Alleviating Local Semantic Drift (HALD), which leverages hard labels as intermediate corrective signals while retaining the fine-grained advantages of soft labels. Extensive experiments on dataset distillation and large-scale conventional classification benchmarks validate our approach, showing consistent improvements in generalization. On ImageNet-1K, we achieve 42.7% with only 285M storage for soft labels, outperforming prior state-of-the-art LPLD by 9.0%. Our findings re-establish the importance of hard labels as a complementary tool, and call for a rethinking of their role in soft-label-dominated training.
Problem

Research questions and friction points this paper is trying to address.

Soft labels cause semantic drift when few image crops are used
Mismatch between local visual content and global semantic meaning occurs
Hybridizing soft and hard labels restores semantic-visual alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybridizing soft and hard labels to correct semantic drift
Using hard labels as content-agnostic anchors for calibration
Proposing HALD training paradigm with intermediate corrective signals
🔎 Similar Papers
No similar papers found.
Jiacheng Cui
Jiacheng Cui
Mohamed bin Zayed University of Artificial Intelligence
Dataset DistillationEfficient Learning
B
Bingkui Tong
Mohamed bin Zayed University of Artificial Intelligence
Xinyue Bi
Xinyue Bi
University of Ottawa
Efficient Learning
Xiaohan Zhao
Xiaohan Zhao
Mohamed bin Zayed University of Artificial Intelligence
efficient deep learningadversarial attack
J
Jiacheng Liu
Mohamed bin Zayed University of Artificial Intelligence
Z
Zhiqiang Shen
Mohamed bin Zayed University of Artificial Intelligence