🤖 AI Summary
This work investigates the implicit regularization mechanisms underlying mainstream federated learning algorithms—FedAvg, FedSAM, and SCAFFOLD—under non-IID data distributions, and identifies the root causes of their convergence disparities. We introduce backward error analysis—a first in federated learning theory—to quantitatively characterize first- and second-order implicit regularization biases under non-convex settings. Our analysis reveals that FedAvg implicitly amplifies gradient variance; FedSAM partially mitigates first-order bias; and SCAFFOLD eliminates first-order bias entirely but retains residual second-order bias. This unified modeling framework explains fundamental performance limits of these algorithms, with empirical results confirming theoretical predictions. The key contribution is the establishment of the first backward-error-based analytical paradigm for implicit regularization in federated optimization, providing a novel theoretical foundation for understanding and designing robust federated optimizers.
📝 Abstract
Backward error analysis allows finding a modified loss function, which the parameter updates really follow under the influence of an optimization method. The additional loss terms included in this modified function is called implicit regularizer. In this paper, we attempt to find the implicit regularizer for various federated learning algorithms on non-IID data distribution, and explain why each method shows different convergence behavior. We first show that the implicit regularizer of FedAvg disperses the gradient of each client from the average gradient, thus increasing the gradient variance. We also empirically show that the implicit regularizer hampers its convergence. Similarly, we compute the implicit regularizers of FedSAM and SCAFFOLD, and explain why they converge better. While existing convergence analyses focus on pointing out the advantages of FedSAM and SCAFFOLD, our approach can explain their limitations in complex non-convex settings. In specific, we demonstrate that FedSAM can partially remove the bias in the first-order term of the implicit regularizer in FedAvg, whereas SCAFFOLD can fully eliminate the bias in the first-order term, but not in the second-order term. Consequently, the implicit regularizer can provide a useful insight on the convergence behavior of federated learning from a different theoretical perspective.