Dataset Color Quantization: A Training-Oriented Framework for Dataset-Level Compression

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a unified, training-aware image compression framework that addresses the high storage cost of large-scale image datasets by exploiting intra-image color redundancy—often overlooked by existing methods that primarily rely on sample discarding. The approach introduces a consistent color palette across the entire dataset, integrating semantic-aware color selection, cross-image consistency constraints, and a structure-preserving mechanism to retain information critical for model training. Evaluated on CIFAR-10/100, Tiny-ImageNet, and ImageNet-1K, the method achieves high compression ratios while significantly outperforming current compression strategies in preserving downstream model performance, thereby effectively balancing storage efficiency with training efficacy.

Technology Category

Application Category

📝 Abstract
Large-scale image datasets are fundamental to deep learning, but their high storage demands pose challenges for deployment in resource-constrained environments. While existing approaches reduce dataset size by discarding samples, they often ignore the significant redundancy within each image -- particularly in the color space. To address this, we propose Dataset Color Quantization (DCQ), a unified framework that compresses visual datasets by reducing color-space redundancy while preserving information crucial for model training. DCQ achieves this by enforcing consistent palette representations across similar images, selectively retaining semantically important colors guided by model perception, and maintaining structural details necessary for effective feature learning. Extensive experiments across CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet-1K show that DCQ significantly improves training performance under aggressive compression, offering a scalable and robust solution for dataset-level storage reduction. Code is available at \href{https://github.com/he-y/Dataset-Color-Quantization}{https://github.com/he-y/Dataset-Color-Quantization}.
Problem

Research questions and friction points this paper is trying to address.

dataset compression
color quantization
storage reduction
image redundancy
deep learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset Color Quantization
color-space redundancy
training-oriented compression
consistent palette representation
semantic color preservation
🔎 Similar Papers
No similar papers found.
C
Chenyue Yu
CFAR, Agency for Science, Technology and Research, Singapore; IHPC, Agency for Science, Technology and Research, Singapore; National University of Singapore
Lingao Xiao
Lingao Xiao
National University of Singapore
Efficient Deep Learning
J
Jinhong Deng
CFAR, Agency for Science, Technology and Research, Singapore; IHPC, Agency for Science, Technology and Research, Singapore; University of Electronic Science and Technology of China (UESTC)
I
Ivor W. Tsang
CFAR, Agency for Science, Technology and Research, Singapore; IHPC, Agency for Science, Technology and Research, Singapore; Nanyang Technological University (NTU), Singapore
Yang He
Yang He
A*STAR & NUS
Machine LearningComputer Vision