🤖 AI Summary
In hazardous environments, UAV swarms face scarce experience due to constraints on safety, energy consumption, and mission duration, rendering the conventional federated reinforcement learning assumption—that greater participation yields better performance—invalid. This work proposes EC-HFRL, a hierarchical federated framework that treats the swarm as a collective learning agent, leveraging an internal shared experience replay buffer to enable experience reuse and policy coordination. Theoretical analysis and experiments demonstrate that learning performance is primarily governed by the experience reuse strategy and the transfer of critical gradients, rather than merely the number of participating nodes. Controlled mini-batch sampling effectively modulates replay exposure, and high intra-swarm participation significantly enhances both training efficiency and effectiveness, revealing the decisive role of learning signal structure in determining performance bounds.
📝 Abstract
Conventional federated learning assumes that greater learner participation improves training performance, by leveraging abundant, independently generated local data. However, in federated reinforcement learning (FRL) for unmanned aerial vehicle (UAV) teams in hazardous environments where experience generation is severely constrained by safety considerations, energy limitations, and mission duration, this assumption may break. This work introduces Experience-Constrained Hierarchical Federated Reinforcement Learning (EC-HFRL), a framework in which clusters act as federated learning agents, while multiple intra-cluster learners represent parallel learning resources that reuse a shared experience pool. We show that increasing participation does not necessarily improve learning performance. Instead, learning performance is strongly associated with experience reuse strategy and the dominance of key analytically identified gradient transition experiences within a cluster. In particular, minibatch size primarily determines effective replay exposure, while higher intra-cluster participation increases reuse level. Empirical results demonstrate that the performance regimes are strongly associated with the structure of the learning signal, rather than federated aggregation effects, clarifying the limited and secondary role of learner participation in experience-constrained FRL.