🤖 AI Summary
High-dimensional count data—such as those from single-cell RNA sequencing and neural spike trains—lack efficient and natural generative modeling approaches for mapping distributions across batches or time points. This work proposes count-FM, the first method to adapt flow matching to the discrete count domain by leveraging a continuous-time birth-death process that directly models local unit jumps in count space, thereby learning marginal transports between arbitrary count distributions. Crucially, count-FM avoids categorical or continuous approximations, offering simulation without retraining, parameter efficiency, and interpretable transport paths. Experiments on both synthetic and real-world scRNA-seq and neural spiking data demonstrate that count-FM consistently outperforms existing baselines in sample quality, modeling efficiency, and interpretability of the learned transport dynamics.
📝 Abstract
High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent success of diffusion- and flow-based deep generative models for images, video, and text motivates extending these ideas to count-valued settings, but many existing methods either treat each count as a categorical state or transform counts into a continuous space, neither of which is natural or efficient when the count range is large. We propose count-FM, a flow-matching framework for count data based on a continuous-time birth-death process with local unit jumps. Count-FM learns marginal transitions efficiently in count space through simulation-free training of conditional transition rates, allowing transport between arbitrary count-distributed source and target populations. In simulation, count-FM achieves better sample quality than representative baselines while using substantially fewer parameters. We further apply count-FM to scRNA-seq and neural spike-train data for unconditional generation, transport, and conditional generation. Across these tasks, count-FM yields improved sample quality, greater modeling efficiency, and interpretable transport paths.