🤖 AI Summary
This work investigates why flow matching models remain effective even without explicit time conditioning, revealing that temporal information in high-dimensional data can be directly inferred from noisy observations. By decomposing the time-unconditioned flow matching loss and leveraging the spiked-covariance model, subspace concentration phenomena, and statistical estimation theory, the authors demonstrate that the so-called time-blindness gap becomes negligible in high dimensions. Moreover, they establish that the choice of coupling design is substantially more critical than explicit time conditioning. Experiments on CIFAR-10, CelebA-HQ, and FFHQ confirm that altering the coupling scheme has a far greater impact on both training loss and sample quality than removing time conditioning, underscoring that time identifiability fundamentally arises from the geometric structure of the data manifold.
📝 Abstract
Recent work has shown that models flow matching models can be trained without explicit time conditioning, challenging the standard view that the interpolation time is needed to disambiguate velocity targets. But why should a time-blind model work at all? Decomposing the time-blind flow matching loss, we identify two sources of irreducible error: a coupling variance, which arises from ambiguous velocity targets induced by how noise and data points are paired, and the time-blindness gap, which is the additional error caused by ignoring time. This gap shows that time-blind training is strictly harder than conventional training, reinforcing the puzzle that time-blind models work so well in practice. We resolve this tension by showing that the geometry of high-dimensional data makes time identifiable directly from noisy observations. When data concentrates near a $k$-dimensional subspace, time can be recovered from the statistical structure of noisy interpolants in directions orthogonal to the data; under a spiked-covariance model, this yields a closed-form estimator that recovers $t$ from a single observation $z$ at rate $O(1/\sqrt{d-k})$ for ambient dimension $d$. As a consequence, we prove that the time-blindness gap is asymptotically negligible relative to the coupling variance. We empirically demonstrate our identifiability result on real-world data and show that changing the coupling has a much larger effect on loss and sample quality than removing time conditioning across CIFAR-10, CelebA-HQ, and FFHQ. These results explain why time-blind flow matching works and show that the main practical lever is the choice of coupling, not explicit time conditioning.