🤖 AI Summary
This work addresses a systematic bias of order O(1/n) in drift models under small-batch training, induced by the self-normalizing property of the softmax function, which compromises centroid estimation accuracy. The authors propose an Analytic Bias Correction (ABC) method that explicitly models the dominant bias term using within-batch statistics and corrects the empirical centroid via a closed-form plug-in estimator. ABC is the first approach to analytically quantify and correct this bias, reducing the error to O(1/n²) without increasing first-order variance or violating convex hull containment. Experiments demonstrate that ABC significantly lowers FID and accelerates convergence on CIFAR-10, with pronounced improvements in small-batch settings; synthetic experiments further validate the theoretical bias order.
📝 Abstract
Drifting models are capable one-step generative models trained to follow a drifting field. The field combines attractive and repulsive softmax-weighted centroids over the data and current-generator distributions. In practice, only a minibatch of $n$ samples from each distribution is available, and each centroid is approximated by an empirical estimate. In this paper, we begin by showing that the minibatch centroid is in general a biased estimator of the target centroid, with a pointwise $O(1/n)$ bias arising from softmax self-normalization. Correcting this bias requires the expectation over the full distribution, which is intractable. We instead approximate the leading bias term from in-batch statistics and propose Analytical Bias Correction (ABC), a closed-form plug-in adjustment. We prove that ABC reduces the bias from $O(1/n)$ to $O(1/n^2)$, introduces no first-order increase in total variance, and preserves convex-hull containment of the corrected centroid. In practice, ABC requires only two additional lines of code and has negligible wall-time overhead under compiled execution. Toy experiments confirm the theoretical $O(1/n)$ and $O(1/n^2)$ scaling. On CIFAR-10, ABC reduces FID and trains faster, with the largest gains at small $n$, where the bias is most significant.