HIERAMP: Coarse-to-Fine Autoregressive Amplification for Generative Dataset Distillation

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Existing dataset distillation methods predominantly rely on global semantic similarity, which struggles to capture the multi-level structural semantics within objects. This work proposes a hierarchical semantic enhancement mechanism based on the Visual Autoregressive (VAR) model, leveraging its coarse-to-fine generative property to dynamically inject class tokens at each scale for identifying salient regions and guiding semantic amplification at corresponding levels. By doing so, the method enhances the discriminability and diversity of distilled data without explicitly optimizing global similarity. Experimental results demonstrate consistent performance gains across multiple benchmarks, with generated samples exhibiting both diverse coarse-grained layouts and focused fine-grained details.

Technology Category

Application Category

📝 Abstract

Dataset distillation often prioritizes global semantic proximity when creating small surrogate datasets for original large-scale ones. However, object semantics are inherently hierarchical. For example, the position and appearance of a bird's eyes are constrained by the outline of its head. Global proximity alone fails to capture how object-relevant structures at different levels support recognition. In this work, we investigate the contributions of hierarchical semantics to effective distilled data. We leverage the vision autoregressive (VAR) model whose coarse-to-fine generation mirrors this hierarchy and propose HIERAMP to amplify semantics at different levels. At each VAR scale, we inject class tokens that dynamically identify salient regions and use their induced maps to guide amplification at that scale. This adds only marginal inference cost while steering synthesis toward discriminative parts and structures. Empirically, we find that semantic amplification leads to more diverse token choices in constructing coarse-scale object layouts. Conversely, at fine scales, the amplification concentrates token usage, increasing focus on object-related details. Across popular dataset distillation benchmarks, HIERAMP consistently improves validation performance without explicitly optimizing global proximity, demonstrating the importance of semantic amplification for effective dataset distillation.

Problem

Research questions and friction points this paper is trying to address.

dataset distillation

hierarchical semantics

semantic proximity

object structure

generative modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical semantics

autoregressive generation

dataset distillation