Dataset Distillation for Super-Resolution without Class Labels and Pre-trained Models

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing GAN-based data distillation methods for single-image super-resolution (SISR) heavily rely on pre-trained models and class labels, severely limiting generalizability. Method: We propose the first class-agnostic and model-agnostic SISR data distillation framework: (i) extract high-gradient image patches; (ii) perform unsupervised clustering using CLIP visual features; and (iii) jointly fine-tune a diffusion model to generate high-fidelity distilled data and train a Transformer-based super-resolution network. Contribution/Results: Our method achieves near-lossless performance—only 0.3 dB PSNR degradation—using merely 0.68% of the original training samples. Diffusion model fine-tuning takes just 4 hours, while SISR network training requires only 1 hour—11× faster than full-dataset training. To our knowledge, this is the first SISR distillation approach that eliminates dependence on both class annotations and pre-trained networks, significantly enhancing generalization under data scarcity and improving training efficiency.

Technology Category

Application Category

📝 Abstract
Training deep neural networks has become increasingly demanding, requiring large datasets and significant computational resources, especially as model complexity advances. Data distillation methods, which aim to improve data efficiency, have emerged as promising solutions to this challenge. In the field of single image super-resolution (SISR), the reliance on large training datasets highlights the importance of these techniques. Recently, a generative adversarial network (GAN) inversion-based data distillation framework for SR was proposed, showing potential for better data utilization. However, the current method depends heavily on pre-trained SR networks and class-specific information, limiting its generalizability and applicability. To address these issues, we introduce a new data distillation approach for image SR that does not need class labels or pre-trained SR models. In particular, we first extract high-gradient patches and categorize images based on CLIP features, then fine-tune a diffusion model on the selected patches to learn their distribution and synthesize distilled training images. Experimental results show that our method achieves state-of-the-art performance while using significantly less training data and requiring less computational time. Specifically, when we train a baseline Transformer model for SR with only 0.68% of the original dataset, the performance drop is just 0.3 dB. In this case, diffusion model fine-tuning takes 4 hours, and SR model training completes within 1 hour, much shorter than the 11-hour training time with the full dataset.
Problem

Research questions and friction points this paper is trying to address.

Distilling datasets for super-resolution without class labels
Eliminating dependency on pre-trained models for SR
Reducing computational resources and training data needs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses CLIP features for image categorization
Fine-tunes diffusion model on high-gradient patches
Synthesizes distilled images without pre-trained models
🔎 Similar Papers
No similar papers found.
S
Sunwoo Cho
Department of Electrical and Computer Engineering, INMC, Seoul National University, Seoul 08826, Republic of Korea
Y
Yejin Jung
Graduate School of Engineering Practice, INMC, Seoul National University, Seoul 08826, Republic of Korea
Nam Ik Cho
Nam Ik Cho
Seoul National University, Dept. of Electrical and Computer Engineering
Image ProcessingSignal ProcessingAdaptive FilteringComputer Vision
Jae Woong Soh
Jae Woong Soh
Gwangju Institute of Science and Technology (GIST)
Computer VisionImage ProcessingDeep Learning