Dataset Condensation with Color Compensation

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Dataset compression inherently faces a trade-off between efficiency and fidelity: image-level methods suffer from low computational efficiency, while pixel-level distillation often incurs semantic distortion. This work first identifies color as both an information carrier and a fundamental semantic unit, and proposes DC3—a color-compensation-based compression framework. DC3 does not synthesize new images; instead, it leverages a pre-trained diffusion model to perform pixel-level chromatic enhancement on compressed images, thereby enriching color diversity and semantic expressiveness while preserving structural integrity. Integrated with a calibration-aware selection strategy and latent-space optimization, DC3 overcomes longstanding bottlenecks in semantic fidelity and compression efficiency, and—crucially—enables direct fine-tuning of diffusion models on compressed data for the first time. On multiple benchmarks, DC3 significantly outperforms state-of-the-art methods, reducing FID by 12–28%, while ensuring training stability and eliminating model collapse.

Technology Category

Application Category

📝 Abstract

Dataset condensation always faces a constitutive trade-off: balancing performance and fidelity under extreme compression. Existing methods struggle with two bottlenecks: image-level selection methods (Coreset Selection, Dataset Quantization) suffer from inefficiency condensation, while pixel-level optimization (Dataset Distillation) introduces semantic distortion due to over-parameterization. With empirical observations, we find that a critical problem in dataset condensation is the oversight of color's dual role as an information carrier and a basic semantic representation unit. We argue that improving the colorfulness of condensed images is beneficial for representation learning. Motivated by this, we propose DC3: a Dataset Condensation framework with Color Compensation. After a calibrated selection strategy, DC3 utilizes the latent diffusion model to enhance the color diversity of an image rather than creating a brand-new one. Extensive experiments demonstrate the superior performance and generalization of DC3 that outperforms SOTA methods across multiple benchmarks. To the best of our knowledge, besides focusing on downstream tasks, DC3 is the first research to fine-tune pre-trained diffusion models with condensed datasets. The FID results prove that training networks with our high-quality datasets is feasible without model collapse or other degradation issues. Code and generated data will be released soon.

Problem

Research questions and friction points this paper is trying to address.

Balancing performance and fidelity in dataset condensation

Addressing semantic distortion from over-parameterization in condensation

Enhancing color diversity to improve representation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Color compensation enhances condensed dataset diversity

Latent diffusion model improves color representation

Calibrated selection strategy prevents semantic distortion

🔎 Similar Papers

Elucidating the Design Space of Dataset Condensation