🤖 AI Summary
In distribution-to-distribution generation—particularly in scientific domains such as drug discovery and evolutionary simulation—sparse source samples lead to insufficient supervision, causing standard flow matching to fail due to its reliance on accurate source distribution modeling.
Method: This paper proposes a stochastic augmentation framework for flow matching, introducing three synergistic randomization mechanisms: (i) structured perturbations applied to limited source samples; (ii) path-wise noise injected into the flow interpolation trajectory; and (iii) dynamic sampling of perturbation intensity during training. The method requires no additional labels or prior knowledge of the source distribution.
Contribution/Results: It effectively mitigates overfitting and trajectory deviation under sparse supervision. Evaluated on five cross-distribution image translation tasks across biology, radiology, and astronomy, the approach achieves an average FID reduction of 9.0, a 23% decrease in transport cost, and yields generated trajectories with enhanced interpretability and geometric fidelity.
📝 Abstract
Modeling transformations between arbitrary data distributions is a fundamental scientific challenge, arising in applications like drug discovery and evolutionary simulation. While flow matching offers a natural framework for this task, its use has thus far primarily focused on the noise-to-data setting, while its application in the general distribution-to-distribution setting is underexplored. We find that in the latter case, where the source is also a data distribution to be learned from limited samples, standard flow matching fails due to sparse supervision. To address this, we propose a simple and computationally efficient method that injects stochasticity into the training process by perturbing source samples and flow interpolants. On five diverse imaging tasks spanning biology, radiology, and astronomy, our method significantly improves generation quality, outperforming existing baselines by an average of 9 FID points. Our approach also reduces the transport cost between input and generated samples to better highlight the true effect of the transformation, making flow matching a more practical tool for simulating the diverse distribution transformations that arise in science.