Dataset Distillation for Super-Resolution without Class Labels and Pre-trained Models

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing GAN-based data distillation methods for single-image super-resolution (SISR) heavily rely on pre-trained models and class labels, severely limiting generalizability. Method: We propose the first class-agnostic and model-agnostic SISR data distillation framework: (i) extract high-gradient image patches; (ii) perform unsupervised clustering using CLIP visual features; and (iii) jointly fine-tune a diffusion model to generate high-fidelity distilled data and train a Transformer-based super-resolution network. Contribution/Results: Our method achieves near-lossless performance—only 0.3 dB PSNR degradation—using merely 0.68% of the original training samples. Diffusion model fine-tuning takes just 4 hours, while SISR network training requires only 1 hour—11× faster than full-dataset training. To our knowledge, this is the first SISR distillation approach that eliminates dependence on both class annotations and pre-trained networks, significantly enhancing generalization under data scarcity and improving training efficiency.

Technology Category

Application Category

📝 Abstract

Training deep neural networks has become increasingly demanding, requiring large datasets and significant computational resources, especially as model complexity advances. Data distillation methods, which aim to improve data efficiency, have emerged as promising solutions to this challenge. In the field of single image super-resolution (SISR), the reliance on large training datasets highlights the importance of these techniques. Recently, a generative adversarial network (GAN) inversion-based data distillation framework for SR was proposed, showing potential for better data utilization. However, the current method depends heavily on pre-trained SR networks and class-specific information, limiting its generalizability and applicability. To address these issues, we introduce a new data distillation approach for image SR that does not need class labels or pre-trained SR models. In particular, we first extract high-gradient patches and categorize images based on CLIP features, then fine-tune a diffusion model on the selected patches to learn their distribution and synthesize distilled training images. Experimental results show that our method achieves state-of-the-art performance while using significantly less training data and requiring less computational time. Specifically, when we train a baseline Transformer model for SR with only 0.68% of the original dataset, the performance drop is just 0.3 dB. In this case, diffusion model fine-tuning takes 4 hours, and SR model training completes within 1 hour, much shorter than the 11-hour training time with the full dataset.

Problem

Research questions and friction points this paper is trying to address.

Distilling datasets for super-resolution without class labels

Eliminating dependency on pre-trained models for SR

Reducing computational resources and training data needs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses CLIP features for image categorization

Fine-tunes diffusion model on high-gradient patches

Synthesizes distilled images without pre-trained models

🔎 Similar Papers

No similar papers found.