Fixed Anchors Are Not Enough: Dynamic Retrieval and Persistent Homology for Dataset Distillation

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing decoupled dataset distillation methods suffer from fitting complexity imbalance and anchor attraction effects due to their reliance on static real image patches, which compromises intra-class diversity and generalization. To address this, this work proposes the RETA framework, which introduces Dynamic Retrieval Connections (DRC) to adaptively select optimal real image patches for injection into residual pathways, thereby balancing model fitting capacity and complexity. Additionally, RETA incorporates topological alignment regularization based on persistent homology, aligning the topological structures of synthetic and real data through k-NN feature maps and persistence images. The method achieves state-of-the-art performance across CIFAR-100, Tiny-ImageNet, and ImageNet-1K; notably, on ImageNet-1K with 50 images per class, it attains a top-1 accuracy of 64.3% using ResNet-18, surpassing the best baseline by 3.1%.

Technology Category

Application Category

📝 Abstract
Decoupled dataset distillation (DD) compresses large corpora into a few synthetic images by matching a frozen teacher's statistics. However, current residual-matching pipelines rely on static real patches, creating a fit-complexity gap and a pull-to-anchor effect that reduce intra-class diversity and hurt generalization. To address these issues, we introduce RETA -- a Retrieval and Topology Alignment framework for decoupled DD. First, Dynamic Retrieval Connection (DRC) selects a real patch from a prebuilt pool by minimizing a fit-complexity score in teacher feature space; the chosen patch is injected via a residual connection to tighten feature fit while controlling injected complexity. Second, Persistent Topology Alignment (PTA) regularizes synthesis with persistent homology: we build a mutual k-NN feature graph, compute persistence images of components and loops, and penalize topology discrepancies between real and synthetic sets, mitigating pull-to-anchor effect. Across CIFAR-100, Tiny-ImageNet, ImageNet-1K, and multiple ImageNet subsets, RETA consistently outperforms various baselines under comparable time and memory, especially reaching 64.3% top-1 accuracy on ImageNet-1K with ResNet-18 at 50 images per class, +3.1% over the best prior.
Problem

Research questions and friction points this paper is trying to address.

dataset distillation
static anchors
fit-complexity gap
pull-to-anchor effect
intra-class diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Retrieval
Persistent Homology
Dataset Distillation
Topology Alignment
Decoupled Distillation
🔎 Similar Papers
No similar papers found.
M
Muquan Li
The Laboratory of Intelligent Collaborative Computing of UESTC
H
Hang Gou
The Laboratory of Intelligent Collaborative Computing of UESTC
Y
Yingyi Ma
The Laboratory of Intelligent Collaborative Computing of UESTC
R
Rongzheng Wang
The Laboratory of Intelligent Collaborative Computing of UESTC
K
Ke Qin
The Laboratory of Intelligent Collaborative Computing of UESTC
Tao He
Tao He
UESTC
Image RetrievalComputer Vision