BlazeFL: Fast and Deterministic Federated Learning Simulation

📅 2026-04-04

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the non-determinism in single-node federated learning simulations caused by shared random states and scheduling disparities among concurrently trained clients, which forces researchers to trade off between throughput and reproducibility. To resolve this, the authors propose a thread-level parallel framework based on lock-free shared memory, employing isolated client-specific random number generator streams to achieve bit-wise deterministic training while eliminating inter-process communication and serialization overhead. The approach maintains minimal dependencies and substantially improves performance: on the CIFAR-10 image classification task, it achieves up to a 3.1× speedup over mainstream open-source baselines, significantly reducing execution time for communication-intensive workloads.

Technology Category

Application Category

📝 Abstract

Federated learning (FL) research increasingly relies on single-node simulations with hundreds or thousands of virtual clients, making both efficiency and reproducibility essential. Yet parallel client training often introduces nondeterminism through shared random state and scheduling variability, forcing researchers to trade throughput for reproducibility or to implement custom control logic within complex frameworks. We present BlazeFL, a lightweight framework for single-node FL simulation that alleviates this trade-off through free-threaded shared-memory execution and deterministic randomness management. BlazeFL uses thread-based parallelism with in-memory parameter exchange between the server and clients, avoiding serialization and inter-process communication overhead. To support deterministic execution, BlazeFL assigns isolated random number generator (RNG) streams to clients. Under a fixed software/hardware stack, and when stochastic operators consume BlazeFL-managed generators, this design yields bitwise-identical results across repeated high-concurrency runs in both thread-based and process-based modes. In CIFAR-10 image-classification experiments, BlazeFL substantially reduces execution time relative to a widely used open-source baseline, achieving up to 3.1$\times$ speedup on communication-dominated workloads while preserving a lightweight dependency footprint. Our open-source implementation is available at: https://github.com/kitsuyaazuma/blazefl.

Problem

Research questions and friction points this paper is trying to address.

federated learning

simulation

determinism

reproducibility

parallelism

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning

Deterministic Simulation

Thread-based Parallelism