Fixed Anchors Are Not Enough: Dynamic Retrieval and Persistent Homology for Dataset Distillation

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Existing decoupled dataset distillation methods suffer from fitting complexity imbalance and anchor attraction effects due to their reliance on static real image patches, which compromises intra-class diversity and generalization. To address this, this work proposes the RETA framework, which introduces Dynamic Retrieval Connections (DRC) to adaptively select optimal real image patches for injection into residual pathways, thereby balancing model fitting capacity and complexity. Additionally, RETA incorporates topological alignment regularization based on persistent homology, aligning the topological structures of synthetic and real data through k-NN feature maps and persistence images. The method achieves state-of-the-art performance across CIFAR-100, Tiny-ImageNet, and ImageNet-1K; notably, on ImageNet-1K with 50 images per class, it attains a top-1 accuracy of 64.3% using ResNet-18, surpassing the best baseline by 3.1%.

Technology Category

Application Category

📝 Abstract

Decoupled dataset distillation (DD) compresses large corpora into a few synthetic images by matching a frozen teacher's statistics. However, current residual-matching pipelines rely on static real patches, creating a fit-complexity gap and a pull-to-anchor effect that reduce intra-class diversity and hurt generalization. To address these issues, we introduce RETA -- a Retrieval and Topology Alignment framework for decoupled DD. First, Dynamic Retrieval Connection (DRC) selects a real patch from a prebuilt pool by minimizing a fit-complexity score in teacher feature space; the chosen patch is injected via a residual connection to tighten feature fit while controlling injected complexity. Second, Persistent Topology Alignment (PTA) regularizes synthesis with persistent homology: we build a mutual k-NN feature graph, compute persistence images of components and loops, and penalize topology discrepancies between real and synthetic sets, mitigating pull-to-anchor effect. Across CIFAR-100, Tiny-ImageNet, ImageNet-1K, and multiple ImageNet subsets, RETA consistently outperforms various baselines under comparable time and memory, especially reaching 64.3% top-1 accuracy on ImageNet-1K with ResNet-18 at 50 images per class, +3.1% over the best prior.

Problem

Research questions and friction points this paper is trying to address.

dataset distillation

static anchors

fit-complexity gap

pull-to-anchor effect

intra-class diversity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Retrieval

Persistent Homology

Dataset Distillation