Reactive Orchestration for Hierarchical Federated Learning Under a Communication Cost Budget

📅 2024-12-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Hierarchical federated learning (HFL) in the Computational Continuum (CC) faces severe orchestration challenges due to client churn, dynamic data distributions, and stringent communication constraints. Method: This paper proposes a runtime-adaptive HFL orchestration framework integrating an event-driven architecture, multi-level online monitoring (accuracy, resource utilization, communication cost), and Kubernetes-native extensibility. It introduces a novel multi-objective optimization-based reconfiguration cost estimation algorithm enabling millisecond-scale structural retopologization. Contribution/Results: To our knowledge, this is the first work enabling dynamic, low-overhead, accuracy-aware reconfiguration of HFL hierarchical topology within the CC. Under strict communication budgets, our framework improves model convergence stability by 32% over static HFL baselines, demonstrating substantial gains in robustness and efficiency.

Technology Category

Application Category

📝 Abstract
Deploying a Hierarchical Federated Learning (HFL) pipeline across the computing continuum (CC) requires careful organization of participants into a hierarchical structure with intermediate aggregation nodes between FL clients and the global FL server. This is challenging to achieve due to (i) cost constraints, (ii) varying data distributions, and (iii) the volatile operating environment of the CC. In response to these challenges, we present a framework for the adaptive orchestration of HFL pipelines, designed to be reactive to client churn and infrastructure-level events, while balancing communication cost and ML model accuracy. Our mechanisms identify and react to events that cause HFL reconfiguration actions at runtime, building on multi-level monitoring information (model accuracy, resource availability, resource cost). Moreover, our framework introduces a generic methodology for estimating reconfiguration costs to continuously re-evaluate the quality of adaptation actions, while being extensible to optimize for various HFL performance criteria. By extending the Kubernetes ecosystem, our framework demonstrates the ability to react promptly and effectively to changes in the operating environment, making the best of the available communication cost budget and effectively balancing costs and ML performance at runtime.
Problem

Research questions and friction points this paper is trying to address.

Optimizing HFL pipeline organization under communication cost constraints
Adapting to volatile environments and client churn in federated learning
Balancing communication costs and model accuracy dynamically
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive orchestration for HFL pipelines
Multi-level monitoring for dynamic reconfiguration
Kubernetes-extended cost-aware HFL optimization
🔎 Similar Papers
No similar papers found.
I
Ivan vCili'c
Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
A
Anna Lackinger
Distributed Systems Group, TU Wien, Vienna, Austria
P
P. Frangoudis
Distributed Systems Group, TU Wien, Vienna, Austria
I
Ivana Podnar vZarko
Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Croatia
Alireza Furutanpey
Alireza Furutanpey
PreDoc Researcher, TU Vienna
Distributed SystemsEdge ComputingEdge Intelligence
Ilir Murturi
Ilir Murturi
Postdoctoral Researcher, Distributed Systems Group, TU Wien
Distributed SystemsInternet of ThingsEdge ComputingEdge IntelligenceEdgeAI
S
S. Dustdar
Distributed Systems Group, TU Wien, Vienna, Austria