🤖 AI Summary
This work addresses the limitations of centralized SDN in handling bursty traffic and the poor robustness of existing learning-based approaches under distribution shifts due to their reliance on offline training. The authors propose a hierarchical traffic control framework that enables online adaptation at edge nodes while adhering to global policy constraints. By employing policy envelopes to bound the action space per path, the framework supports real-time decisions on metering, queuing, and rerouting, ensuring local policies remain auditable and rollback-capable. Integrating centralized policy compilation with edge-based reinforcement learning, the approach demonstrates significant improvements in a 1024-host testbed: compared to Static ECMP, it achieves a 35.5% increase in core link utilization, a 34.3% reduction in P99 flow completion time for elephant flows, and lowers SLA violation rates from 18.2% to 6.8%, with each edge agent consuming less than 2% CPU and only 12 MB of memory.
📝 Abstract
Software defined networks offer global visibility, yet centralized control loops are too slow for transient congestion and bursty traffic dynamics. Existing learned traffic control schemes often rely on offline training, making them fragile under distribution shifts. We present PolicyCache-SDN, a hierarchical SDN traffic control framework that enables local online adaptation under centralized policy control. Its key abstraction is a policy envelope: the controller compiles network wide intent into bounded per path action spaces, while edge agents learn and execute metering, queueing, and rerouting decisions only within those bounds. Policy envelopes also make local actions auditable and reversible when they affect shared bottlenecks. Evaluation on a 1,024 host software SDN testbed shows that PolicyCache-SDN improves average core link utilization by 35.5% over Static ECMP and 18.3% over Centralized TE. It reduces elephant flow P99 FCT by 34.3% over end host congestion control, lowers SLA violations from 18.2% to 6.8%, and uses less than 2% CPU and 12 MB memory per edge agent.
The source code is available in an anonymized repository at https://anonymous.4open.science/r/JCC2026-PolicyCache-SDN/.