Three Forms of Stochastic Injection for Improved Distribution-to-Distribution Generative Modeling

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

In distribution-to-distribution generation—particularly in scientific domains such as drug discovery and evolutionary simulation—sparse source samples lead to insufficient supervision, causing standard flow matching to fail due to its reliance on accurate source distribution modeling. Method: This paper proposes a stochastic augmentation framework for flow matching, introducing three synergistic randomization mechanisms: (i) structured perturbations applied to limited source samples; (ii) path-wise noise injected into the flow interpolation trajectory; and (iii) dynamic sampling of perturbation intensity during training. The method requires no additional labels or prior knowledge of the source distribution. Contribution/Results: It effectively mitigates overfitting and trajectory deviation under sparse supervision. Evaluated on five cross-distribution image translation tasks across biology, radiology, and astronomy, the approach achieves an average FID reduction of 9.0, a 23% decrease in transport cost, and yields generated trajectories with enhanced interpretability and geometric fidelity.

Technology Category

Application Category

📝 Abstract

Modeling transformations between arbitrary data distributions is a fundamental scientific challenge, arising in applications like drug discovery and evolutionary simulation. While flow matching offers a natural framework for this task, its use has thus far primarily focused on the noise-to-data setting, while its application in the general distribution-to-distribution setting is underexplored. We find that in the latter case, where the source is also a data distribution to be learned from limited samples, standard flow matching fails due to sparse supervision. To address this, we propose a simple and computationally efficient method that injects stochasticity into the training process by perturbing source samples and flow interpolants. On five diverse imaging tasks spanning biology, radiology, and astronomy, our method significantly improves generation quality, outperforming existing baselines by an average of 9 FID points. Our approach also reduces the transport cost between input and generated samples to better highlight the true effect of the transformation, making flow matching a more practical tool for simulating the diverse distribution transformations that arise in science.

Problem

Research questions and friction points this paper is trying to address.

Modeling transformations between arbitrary data distributions for scientific applications

Addressing sparse supervision in distribution-to-distribution flow matching

Improving generation quality across diverse imaging domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Injecting stochasticity into training via sample perturbation

Perturbing flow interpolants to enhance distribution matching

Reducing transport cost between input and output distributions

🔎 Similar Papers

Enhancing Accuracy in Generative Models via Knowledge Transfer