OLAF: Programmable Data Plane Acceleration for Asynchronous Distributed Reinforcement Learning

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

To address model update latency and staleness caused by network congestion in asynchronous distributed reinforcement learning, this paper proposes an in-line acceleration architecture leveraging programmable data planes. The method introduces three key innovations: (1) a dynamic queue mechanism supporting in-network model update aggregation; (2) the Age-of-Model (AoM) metric to quantify update freshness, integrated with in-network feedback to ensure global fairness and responsiveness; and (3) lightweight transport control combined with formal verification to minimize redundant traffic. Experimental evaluation demonstrates that the architecture significantly reduces model staleness and network congestion, achieving up to a 2.3× speedup in convergence rate and substantially outperforming baseline approaches in training efficiency.

Technology Category

Application Category

📝 Abstract

Asynchronous Distributed Reinforcement Learning (DRL) can suffer from degraded convergence when model updates become stale, often the result of network congestion and packet loss during large-scale training. This work introduces a network data-plane acceleration architecture that mitigates such staleness by enabling inline processing of DRL model updates as they traverse the accelerator engine. To this end, we design and prototype a novel queueing mechanism that opportunistically combines compatible updates sharing a network element, reducing redundant traffic and preserving update utility. Complementing this we provide a lightweight transmission control mechanism at the worker nodes that is guided by feedback from the in-network accelerator. To assess model utility at line rate, we introduce the Age-of-Model (AoM) metric as a proxy for staleness and verify global fairness and responsiveness properties using a formal verification method. Our evaluations demonstrate that this architecture significantly reduces update staleness and congestion, ultimately improving the convergence rate in asynchronous DRL workloads.

Problem

Research questions and friction points this paper is trying to address.

Mitigates model update staleness in asynchronous DRL

Reduces network congestion and redundant traffic

Improves convergence rate in distributed reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inline processing of DRL model updates

Opportunistic queueing combining compatible updates

Lightweight transmission control with feedback

🔎 Similar Papers

No similar papers found.