🤖 AI Summary
To address the high per-client proximal computation overhead and severe convergence degradation caused by data heterogeneity in nonconvex composite federated learning (FL), this paper proposes FedCanon. Our method decouples local client updates from proximal operations, requiring only a single proximal evaluation per round at the server; it further introduces a control variate mechanism incorporating the global gradient to explicitly model and mitigate data bias. Theoretically, FedCanon is the first algorithm for nonconvex composite FL achieving single-server-proximal-evaluation per round, guaranteeing sublinear convergence without bounded heterogeneity assumptions—and linear convergence under the Polyak–Łojasiewicz (PL) condition. Empirically, FedCanon outperforms state-of-the-art methods across multiple heterogeneous benchmarks: it improves model accuracy, reduces proximal computation cost by 3–5×, decreases communication rounds by 20%–35%, and yields more stable convergence.
📝 Abstract
Composite federated learning offers a general framework for solving machine learning problems with additional regularization terms. However, many existing methods require clients to perform multiple proximal operations to handle non-smooth terms and their performance are often susceptible to data heterogeneity. To overcome these limitations, we propose a novel composite federated learning algorithm called extbf{FedCanon}, designed to solve the optimization problems comprising a possibly non-convex loss function and a weakly convex, potentially non-smooth regularization term. By decoupling proximal mappings from local updates, FedCanon requires only a single proximal evaluation on the server per iteration, thereby reducing the overall proximal computation cost. It also introduces control variables that incorporate global gradient information into client updates, which helps mitigate the effects of data heterogeneity. Theoretical analysis demonstrates that FedCanon achieves sublinear convergence rates under general non-convex settings and linear convergence under the Polyak-{L}ojasiewicz condition, without relying on bounded heterogeneity assumptions. Experiments demonstrate that FedCanon outperforms the state-of-the-art methods in terms of both accuracy and computational efficiency, particularly under heterogeneous data distributions.