🤖 AI Summary
In decentralized wireless networks, multi-source–destination pairs face a fundamental trade-off between throughput maximization and fairness assurance when autonomously learning spectrum access policies over limited orthogonal frequency bands. To address this, we propose FSRL—the first coordination-free, fully decentralized reinforcement learning framework for spectrum access. FSRL introduces three key innovations: (i) semi-adaptive temporal state augmentation, (ii) risk-aware temporal modeling, and (iii) a fairness-driven reward mechanism grounded in the Jain fairness index. Furthermore, it integrates collision-feedback-guided state learning with a time-difference likelihood architecture to ensure robust decision-making. Evaluated across 50+ heterogeneous scenarios, FSRL achieves up to an 89.0% fairness improvement over state-of-the-art baselines—particularly under stringent single-band, multi-user conditions—and yields an average fairness gain of 48.1%, while simultaneously enhancing individual throughput and system-wide fairness.
📝 Abstract
We consider a decentralized wireless network with several source-destination pairs sharing a limited number of orthogonal frequency bands. Sources learn to adapt their transmissions (specifically, their band selection strategy) over time, in a decentralized manner, without sharing information with each other. Sources can only observe the outcome of their own transmissions (i.e., success or collision), having no prior knowledge of the network size or of the transmission strategy of other sources. The goal of each source is to maximize their own throughput while striving for network-wide fairness. We propose a novel fully decentralized Reinforcement Learning (RL)-based solution that achieves fairness without coordination. The proposed Fair Share RL (FSRL) solution combines: (i) state augmentation with a semi-adaptive time reference; (ii) an architecture that leverages risk control and time difference likelihood; and (iii) a fairness-driven reward structure. We evaluate FSRL in more than 50 network settings with different number of agents, different amounts of available spectrum, in the presence of jammers, and in an ad-hoc setting. Simulation results suggest that, when we compare FSRL with a common baseline RL algorithm from the literature, FSRL can be up to 89.0% fairer (as measured by Jain's fairness index) in stringent settings with several sources and a single frequency band, and 48.1% fairer on average.