🤖 AI Summary
This work addresses the non-determinism in single-node federated learning simulations caused by shared random states and scheduling disparities among concurrently trained clients, which forces researchers to trade off between throughput and reproducibility. To resolve this, the authors propose a thread-level parallel framework based on lock-free shared memory, employing isolated client-specific random number generator streams to achieve bit-wise deterministic training while eliminating inter-process communication and serialization overhead. The approach maintains minimal dependencies and substantially improves performance: on the CIFAR-10 image classification task, it achieves up to a 3.1× speedup over mainstream open-source baselines, significantly reducing execution time for communication-intensive workloads.
📝 Abstract
Federated learning (FL) research increasingly relies on single-node simulations with hundreds or thousands of virtual clients, making both efficiency and reproducibility essential. Yet parallel client training often introduces nondeterminism through shared random state and scheduling variability, forcing researchers to trade throughput for reproducibility or to implement custom control logic within complex frameworks. We present BlazeFL, a lightweight framework for single-node FL simulation that alleviates this trade-off through free-threaded shared-memory execution and deterministic randomness management. BlazeFL uses thread-based parallelism with in-memory parameter exchange between the server and clients, avoiding serialization and inter-process communication overhead. To support deterministic execution, BlazeFL assigns isolated random number generator (RNG) streams to clients. Under a fixed software/hardware stack, and when stochastic operators consume BlazeFL-managed generators, this design yields bitwise-identical results across repeated high-concurrency runs in both thread-based and process-based modes. In CIFAR-10 image-classification experiments, BlazeFL substantially reduces execution time relative to a widely used open-source baseline, achieving up to 3.1$\times$ speedup on communication-dominated workloads while preserving a lightweight dependency footprint. Our open-source implementation is available at: https://github.com/kitsuyaazuma/blazefl.