🤖 AI Summary
This work investigates the robustness of decentralized learning under node failures and abrupt data distribution shifts. We propose a multi-configuration distributed training framework grounded in dynamic graph modeling and phase-wise convergence analysis, augmented by heterogeneous data perturbation simulation to systematically uncover the nontrivial coupling between knowledge persistence and residual connectivity. Theoretically, we prove that preserving only a minimal amount of local data suffices to guarantee both global model recoverability and knowledge retention at isolated nodes. Experiments demonstrate sustained classification accuracy even under large-scale node failures or complete isolation, validating strong robustness against concurrent structural and statistical perturbations. Our core contribution is the first quantitative characterization of the relationship between knowledge survivability and topological resilience, yielding verifiable design principles for robust decentralized learning systems.
📝 Abstract
In the vibrant landscape of AI research, decentralised learning is gaining momentum. Decentralised learning allows individual nodes to keep data locally where they are generated and to share knowledge extracted from local data among themselves through an interactive process of collaborative refinement. This paradigm supports scenarios where data cannot leave local nodes due to privacy or sovereignty reasons or real-time constraints imposing proximity of models to locations where inference has to be carried out. The distributed nature of decentralised learning implies significant new research challenges with respect to centralised learning. Among them, in this paper, we focus on robustness issues. Specifically, we study the effect of nodes' disruption on the collective learning process. Assuming a given percentage of"central"nodes disappear from the network, we focus on different cases, characterised by (i) different distributions of data across nodes and (ii) different times when disruption occurs with respect to the start of the collaborative learning task. Through these configurations, we are able to show the non-trivial interplay between the properties of the network connecting nodes, the persistence of knowledge acquired collectively before disruption or lack thereof, and the effect of data availability pre- and post-disruption. Our results show that decentralised learning processes are remarkably robust to network disruption. As long as even minimum amounts of data remain available somewhere in the network, the learning process is able to recover from disruptions and achieve significant classification accuracy. This clearly varies depending on the remaining connectivity after disruption, but we show that even nodes that remain completely isolated can retain significant knowledge acquired before the disruption.