TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing dataset distillation methods predominantly rely on static distribution matching (DM), overlooking the dynamic evolution of feature representations during model training—leading to limited expressiveness of synthetic data and suboptimal downstream performance. This paper proposes Trajectory-Guided Dataset Distillation (TGDD), which reformulates distribution matching as a dynamic feature alignment process along the model’s training trajectory. TGDD further introduces distribution-constrained regularization to mitigate class overlap and enhance semantic diversity and representativeness. Crucially, it achieves these improvements without incurring additional optimization overhead, preserving computational efficiency while substantially improving synthetic data quality. Extensive experiments across ten benchmark datasets demonstrate that TGDD achieves state-of-the-art performance; notably, it yields an average accuracy gain of 5.0% on high-resolution tasks, validating its effectiveness and strong generalization capability.

Technology Category

Application Category

📝 Abstract

Dataset distillation compresses large datasets into compact synthetic ones to reduce storage and computational costs. Among various approaches, distribution matching (DM)-based methods have attracted attention for their high efficiency. However, they often overlook the evolution of feature representations during training, which limits the expressiveness of synthetic data and weakens downstream performance. To address this issue, we propose Trajectory Guided Dataset Distillation (TGDD), which reformulates distribution matching as a dynamic alignment process along the model's training trajectory. At each training stage, TGDD captures evolving semantics by aligning the feature distribution between the synthetic and original dataset. Meanwhile, it introduces a distribution constraint regularization to reduce class overlap. This design helps synthetic data preserve both semantic diversity and representativeness, improving performance in downstream tasks. Without additional optimization overhead, TGDD achieves a favorable balance between performance and efficiency. Experiments on ten datasets demonstrate that TGDD achieves state-of-the-art performance, notably a 5.0% accuracy gain on high-resolution benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Dynamic alignment of feature distributions during training

Reducing class overlap with distribution constraint regularization

Improving downstream task performance without extra optimization cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic alignment along training trajectory for distribution matching

Distribution constraint regularization to reduce class overlap

Balances performance and efficiency without extra optimization overhead

🔎 Similar Papers

No similar papers found.