Distilling Dataset into Neural Field

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational and memory overhead of training on large-scale datasets, this paper proposes a data distillation framework based on Implicit Neural Representations (INRs). Unlike conventional explicit sample-based distillation, our approach parameterizes the entire dataset as a lightweight neural field, implicitly encoding essential training information via coordinate-to-feature mapping—enabling, for the first time, unified distillation across modalities (images, videos, audio, and 3D voxels). We theoretically establish that, under identical parameter budgets, our representation exhibits superior expressive power compared to existing methods. Experiments on multimodal benchmarks demonstrate that synthesized data volumes can be reduced by over 99% while preserving model training accuracy comparable to that achieved with full datasets—significantly outperforming state-of-the-art distillation approaches. Our code and models are publicly available.

Technology Category

Application Category

📝 Abstract
Utilizing a large-scale dataset is essential for training high-performance deep learning models, but it also comes with substantial computation and storage costs. To overcome these challenges, dataset distillation has emerged as a promising solution by compressing the large-scale dataset into a smaller synthetic dataset that retains the essential information needed for training. This paper proposes a novel parameterization framework for dataset distillation, coined Distilling Dataset into Neural Field (DDiF), which leverages the neural field to store the necessary information of the large-scale dataset. Due to the unique nature of the neural field, which takes coordinates as input and output quantity, DDiF effectively preserves the information and easily generates various shapes of data. We theoretically confirm that DDiF exhibits greater expressiveness than some previous literature when the utilized budget for a single synthetic instance is the same. Through extensive experiments, we demonstrate that DDiF achieves superior performance on several benchmark datasets, extending beyond the image domain to include video, audio, and 3D voxel. We release the code at https://github.com/aailab-kaist/DDiF.
Problem

Research questions and friction points this paper is trying to address.

Reduces computation and storage costs in deep learning.
Compresses large datasets into smaller synthetic datasets.
Enhances dataset distillation using neural fields.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural field for dataset compression
Coordinates-based information preservation
Superior performance across multiple domains
🔎 Similar Papers
No similar papers found.
D
Donghyeok Shin
Korea Advanced Institute of Science and Technology (KAIST)
H
Heesun Bae
Korea Advanced Institute of Science and Technology (KAIST)
G
Gyuwon Sim
Korea Advanced Institute of Science and Technology (KAIST)
W
Wanmo Kang
Korea Advanced Institute of Science and Technology (KAIST)
Il-Chul Moon
Il-Chul Moon
Professor, Department of Industrial and Systems Engineering, KAIST
Modeling and SimulationArtificial Intelligence