CountsDiff: A Diffusion Model on the Natural Numbers for Generation and Imputation of Count-Based Data

📅 2026-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion models struggle to effectively model discrete, ordered count data such as single-cell RNA sequencing (scRNA-seq) measurements. This work proposes a diffusion model specifically designed for the natural number domain, innovatively integrating continuous-time training, classifier-free guidance, and non-monotonic reverse trajectories—such as churn or remasking—into count data generation. The approach employs survival probability scheduling and an explicit loss weighting mechanism, simplifying the parameterization of the Blackout framework. Evaluated on scRNA-seq imputation tasks, the method matches or exceeds state-of-the-art performance, while also demonstrating broad applicability through strong results on image benchmarks including CIFAR-10 and CelebA.
📝 Abstract
Diffusion models have excelled at generative tasks for both continuous and token-based domains, but their application to discrete ordinal data remains underdeveloped. We present CountsDiff, a diffusion framework designed to natively model distributions on the natural numbers. CountsDiff extends the Blackout diffusion framework by simplifying its formulation through a direct parameterization in terms of a survival probability schedule and an explicit loss weighting. This introduces flexibility through design parameters with direct analogues in existing diffusion modeling frameworks. Beyond this reparameterization, CountsDiff introduces features from modern diffusion models, previously absent in counts-based domains, including continuous-time training, classifier-free guidance, and churn/remasking reverse dynamics that allow non-monotone reverse trajectories. We propose an initial instantiation of CountsDiff and validate it on natural image datasets (CIFAR-10, CelebA), exploring the effects of varying the introduced design parameters in a complex, well-studied, and interpretable data domain. We then highlight biological count assays as a natural use case, evaluating CountsDiff on single-cell RNA-seq imputation in a fetal cell and heart cell atlas. Remarkably, we find that even this simple instantiation matches or surpasses the performance of a state-of-the-art discrete generative model and leading RNA-seq imputation methods, while leaving substantial headroom for further gains through optimized design choices in future work.
Problem

Research questions and friction points this paper is trying to address.

count-based data
discrete ordinal data
diffusion models
data imputation
generative modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion model
count data
discrete generative modeling
classifier-free guidance
single-cell RNA-seq imputation
🔎 Similar Papers
No similar papers found.
R
Renzo G. Soatto
Laboratory for Information and Decision Systems, MIT, Cambridge, MA, USA; Massachusetts Institute of Technology, Cambridge, MA, USA; Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
A
Anders Hoel
Laboratory for Information and Decision Systems, MIT, Cambridge, MA, USA; Massachusetts Institute of Technology, Cambridge, MA, USA
G
Greycen Ren
Massachusetts Institute of Technology, Cambridge, MA, USA
S
Shorna Alam
Massachusetts Institute of Technology, Cambridge, MA, USA
Stephen Bates
Stephen Bates
Assistant Professor, MIT EECS
StatisticsMachine LearningArtificial IntelligenceUncertainty Quantification
N
Nikolaos P. Daskalakis
McLean Hospital, Harvard Medical School, Harvard University, Belmont, MA, USA; Boston University, Boston, MA, USA
Caroline Uhler
Caroline Uhler
Massachusetts Institute of Technology
mathematical statistics (graphical modelscausal inferencealgebraic statistics)computational biology (gene regulationchro
M
Maria Skoularidou
Massachusetts Institute of Technology, Cambridge, MA, USA; Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA; McLean Hospital, Harvard Medical School, Harvard University, Belmont, MA, USA