π€ AI Summary
This work addresses the challenges of scheduling across autonomous IoT islands, where infrastructure heterogeneity, non-IID workloads, and adversarial threats hinder existing approaches from simultaneously handling heterogeneous state-action spaces and ensuring robustness. To this end, we propose DeFRiS, a decentralized federated reinforcement learning framework that enables efficient, privacy-preserving collaborative scheduling. DeFRiS introduces an action-space-agnostic policy to facilitate cross-island knowledge transfer, a local update mechanism combining GAE with clipped policy optimization to cope with sparse and delayed rewards, and a dual-track aggregation protocol that integrates gradient fingerprint similarity detection with gradient tracking to enhance robustness under non-IID and adversarial conditions. Experiments across 20 heterogeneous islands demonstrate a 6.4% reduction in average response time, 7.2% lower energy consumption, a 10.4% decrease in tail latency risk (CVaRβ.ββ
), near-zero deadline violations, and over 3Γ and 8Γ performance gains in scalability and adversarial scenarios, respectively.
π Abstract
Next-generation IoT applications increasingly span across autonomous administrative entities, necessitating silo-cooperative scheduling to leverage diverse computational resources while preserving data privacy. However, realizing efficient cooperation faces significant challenges arising from infrastructure heterogeneity, Non-IID workload shifts, and the inherent risks of adversarial environments. Existing approaches, relying predominantly on centralized coordination or independent learning, fail to address the incompatibility of state-action spaces across heterogeneous silos and lack robustness against malicious attacks. This paper proposes DeFRiS, a Decentralized Federated Reinforcement Learning framework for robust and scalable Silo-cooperative IoT application scheduling. DeFRiS integrates three synergistic innovations: (i) an action-space-agnostic policy utilizing candidate resource scoring to enable seamless knowledge transfer across heterogeneous silos; (ii) a silo-optimized local learning mechanism combining Generalized Advantage Estimation (GAE) with clipped policy updates to resolve sparse delayed reward challenges; and (iii) a Dual-Track Non-IID robust decentralized aggregation protocol leveraging gradient fingerprints for similarity-aware knowledge transfer and anomaly detection, and gradient tracking for optimization momentum. Extensive experiments on a distributed testbed with 20 heterogeneous silos and realistic IoT workloads demonstrate that DeFRiS significantly outperforms state-of-the-art baselines, reducing average response time by 6.4% and energy consumption by 7.2%, while lowering tail latency risk (CVaR$_{0.95}$) by 10.4% and achieving near-zero deadline violations. Furthermore, DeFRiS achieves over 3 times better performance retention as the system scales and over 8 times better stability in adversarial environments compared to the best-performing baseline.