🤖 AI Summary
This work addresses the challenge of low channel utilization in underwater acoustic networks caused by long propagation delays and node mobility, which jointly introduce significant spatiotemporal uncertainty. To overcome this limitation, the paper proposes MobiU-MAC, a range-free medium access control (MAC) protocol based on deep reinforcement learning that autonomously learns to maximize throughput under dynamic topologies and varying delays. The core innovation lies in the CHILL-STER algorithm: the CHILL-Return mechanism enables stable policy learning despite asynchronous delayed rewards, while the STER mechanism leverages spatiotemporal experience replay to cope with topological perturbations. Theoretical analysis confirms that the algorithm converges to an optimal policy without requiring ranging information. Experimental results demonstrate that MobiU-MAC significantly outperforms existing deep reinforcement learning–based MAC protocols in complex underwater environments, effectively exploiting the system’s maximum delay bound while eliminating ranging overhead.
📝 Abstract
Long propagation delays in underwater acoustic networks (UWANs) cause spatio-temporal uncertainty, constraining channel utilization in medium access control (MAC) protocols. Node mobility within autonomous underwater vehicle scenarios exacerbates these challenges by introducing dynamic propagation delays and varying spatial topologies. We present MobiU-MAC, a deep reinforcement learning (DRL)-based MAC protocol for mobile node access in UWANs that maximizes throughput via autonomous learning. MobiU-MAC incorporates CHILL-STER, a novel DRL algorithm optimized for UWANs that is both ranging-free and delay-robust. CHILL-STER employs a credit horizon-limited $λ$-return (CHILL-Return) mechanism to achieve stable learning under asynchronous delayed rewards, while the companion spatio-temporal experience replay (STER) mechanism addresses topological changes arising from node mobility. This work also demonstrates theoretically that DRL attains optimal policy learning equivalent to a standard Markov decision process under long propagation delays without requiring ranging. Performance evaluations indicate that MobiU-MAC outperforms existing DRL-based MAC protocols for UWANs by leveraging the maximum system delay boundary without ranging overhead, supporting the effectiveness of the proposed theory and algorithm in complex underwater dynamic environments.