Skipping the Zeros in Diffusion Models for Sparse Data Generation

📅 2026-05-03

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Diffusion models struggle to preserve the sparsity patterns inherent in sparse continuous data, often corrupting zero-valued entries that signify missing information and introducing unnecessary computational overhead. To address this limitation, this work proposes Sparsity-Exploiting Diffusion (SED), the first diffusion framework that explicitly incorporates sparsity awareness by dynamically skipping zero-value positions during both forward and reverse processes, modeling only non-zero elements. By doing so, SED maintains high generation quality while substantially improving computational efficiency. Empirical evaluations on benchmark tasks from physics and biology demonstrate that SED matches or surpasses the performance of existing diffusion models and domain-specific baselines, offering a principled and efficient approach to sparse data generation.

📝 Abstract

Diffusion models (DMs) excel on dense continuous data, but are not designed for sparse continuous data. They do not model exact zeros that represent the deliberate absence of a signal. As a result, they erase sparsity patterns and perform unnecessary computation on mostly zero entries. With Sparsity-Exploiting Diffusion (SED), we model only non-zero values, preserving sparsity. SED delivers computational savings while maintaining or improving generation quality by skipping zeros during training and inference. Across physics and biology benchmarks, SED matches or surpasses conventional DMs and domain-specific baselines, while vision experiments provide intuitive insights into the limitations of dense DMs and the benefits of SED.

Problem

Research questions and friction points this paper is trying to address.

diffusion models

sparse data

exact zeros

sparsity patterns

computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparsity-Exploiting Diffusion

diffusion models

sparse data generation