🤖 AI Summary
To address the challenges of resource-constrained heterogeneous clients and high communication/computation overhead in multi-model federated learning (MMFL), this paper proposes a resource-aware dynamic client sampling framework. We first establish a general convergence theory for MMFL. Then, we design MMFL-LVR—a loss-aware, low-variance sampling algorithm—and further introduce MMFL-StaleVRE, a lightweight variant incorporating stale-gradient correction. Our approach jointly integrates variance reduction, resource-aware scheduling, and stale-update compensation to ensure convergence while significantly reducing system overhead. Experiments demonstrate that the proposed methods achieve an average accuracy improvement of 19.1% over random sampling, approaching within 5.4% of the theoretical optimum under full client participation, while substantially lowering both communication and computation costs.
📝 Abstract
Federated learning (FL) allows edge devices to collaboratively train models without sharing local data. As FL gains popularity, clients may need to train multiple unrelated FL models, but communication constraints limit their ability to train all models simultaneously. While clients could train FL models sequentially, opportunistically having FL clients concurrently train different models -- termed multi-model federated learning (MMFL) -- can reduce the overall training time. Prior work uses simple client-to-model assignments that do not optimize the contribution of each client to each model over the course of its training. Prior work on single-model FL shows that intelligent client selection can greatly accelerate convergence, but na""ive extensions to MMFL can violate heterogeneous resource constraints at both the server and the clients. In this work, we develop a novel convergence analysis of MMFL with arbitrary client sampling methods, theoretically demonstrating the strengths and limitations of previous well-established gradient-based methods. Motivated by this analysis, we propose MMFL-LVR, a loss-based sampling method that minimizes training variance while explicitly respecting communication limits at the server and reducing computational costs at the clients. We extend this to MMFL-StaleVR, which incorporates stale updates for improved efficiency and stability, and MMFL-StaleVRE, a lightweight variant suitable for low-overhead deployment. Experiments show our methods improve average accuracy by up to 19.1% over random sampling, with only a 5.4% gap from the theoretical optimum (full client participation).