FocusDD: Real-World Scene Infusion for Robust Dataset Distillation

📅 2025-01-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor generalization and low efficiency of large-scale, high-resolution image data distillation in classification and detection tasks, this paper proposes a resolution-agnostic, focus-driven distillation framework. Methodologically, it leverages a pre-trained Vision Transformer (ViT) to localize salient image patches and synthesize multi-object distilled images; introduces a novel dual-scale distillation mechanism that jointly exploits distilled views and downsampled original-image views; and supports end-to-end gradient-based optimization with joint data augmentation. Notably, it achieves the first effective transfer of distilled data to object detection—demonstrated on YOLOv11. Experiments show that, on ImageNet-1K with 100 images per class (IPC), the distilled data achieves 71.0% and 62.6% top-1 accuracy of ResNet-50 and MobileNet-v2, respectively. On COCO2017 with 50 IPC, it attains 24.4% and 32.1% mAP for YOLOv11n and YOLOv11s—surpassing all state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Dataset distillation has emerged as a strategy to compress real-world datasets for efficient training. However, it struggles with large-scale and high-resolution datasets, limiting its practicality. This paper introduces a novel resolution-independent dataset distillation method Focus ed Dataset Distillation (FocusDD), which achieves diversity and realism in distilled data by identifying key information patches, thereby ensuring the generalization capability of the distilled dataset across different network architectures. Specifically, FocusDD leverages a pre-trained Vision Transformer (ViT) to extract key image patches, which are then synthesized into a single distilled image. These distilled images, which capture multiple targets, are suitable not only for classification tasks but also for dense tasks such as object detection. To further improve the generalization of the distilled dataset, each synthesized image is augmented with a downsampled view of the original image. Experimental results on the ImageNet-1K dataset demonstrate that, with 100 images per class (IPC), ResNet50 and MobileNet-v2 achieve validation accuracies of 71.0% and 62.6%, respectively, outperforming state-of-the-art methods by 2.8% and 4.7%. Notably, FocusDD is the first method to use distilled datasets for object detection tasks. On the COCO2017 dataset, with an IPC of 50, YOLOv11n and YOLOv11s achieve 24.4% and 32.1% mAP, respectively, further validating the effectiveness of our approach.
Problem

Research questions and friction points this paper is trying to address.

Dataset Distillation
Image Classification
Object Detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

FocusDD
Dataset Distillation
Object Detection
🔎 Similar Papers
No similar papers found.