🤖 AI Summary
Intermittent faults in decentralized robotic swarms are notoriously difficult to detect and localize, severely compromising the reliability of collaborative tasks. This paper proposes a multi-layer network-driven proactive-reactive fault-tolerance framework: (1) a resilient backup communication layer is established via self-organizing neural system (SoNS)-inspired persistent topology formation; (2) distributed consensus combined with sequential likelihood ratio testing enables rapid fault identification; and (3) adaptive rerouting ensures uninterrupted control-flow continuity. To our knowledge, this is the first work to integrate SoNS-inspired structures into intermittent fault handling, achieving predictive detection and instantaneous response without centralized coordination. Experimental evaluation in formation control tasks demonstrates significant improvements: +32.7% fault detection rate, −41.5% false alarm rate, effective containment of fault propagation, and guaranteed stable convergence of the swarm.
📝 Abstract
Intermittent faults are transient errors that sporadically appear and disappear. Although intermittent faults pose substantial challenges to reliability and coordination, existing studies of fault tolerance in robot swarms focus instead on permanent faults. One reason for this is that intermittent faults are prohibitively difficult to detect in the fully self-organized ad-hoc networks typical of robot swarms, as their network topologies are transient and often unpredictable. However, in the recently introduced self-organizing nervous systems (SoNS) approach, robot swarms are able to self-organize persistent network structures for the first time, easing the problem of detecting intermittent faults. To address intermittent faults in robot swarms that have persistent networks, we propose a novel proactive-reactive strategy to detection and mitigation, based on self-organized backup layers and distributed consensus in a multiplex network. Proactively, the robots self-organize dynamic backup paths before faults occur, adapting to changes in the primary network topology and the robots'relative positions. Reactively, robots use one-shot likelihood ratio tests to compare information received along different paths in the multiplex network, enabling early fault detection. Upon detection, communication is temporarily rerouted in a self-organized way, until the detected fault resolves. We validate the approach in representative scenarios of faulty positional data occurring during formation control, demonstrating that intermittent faults are prevented from disrupting convergence to desired formations, with high fault detection accuracy and low rates of false positives.