🤖 AI Summary
This work addresses the critical vulnerability of Software-Defined Networking (SDN) controllers in Wide Area Networks (WANs) to severe outages caused by erroneous inputs, such as those stemming from control-plane bugs. To mitigate this risk, the authors introduce input validation as a dedicated defense layer within the WAN control plane, deploying a lightweight validation mechanism ahead of the controller to detect and block invalid inputs in real time. The system employs a shadow deployment architecture that combines simulation with live production data, exhibiting strong robustness against noisy, missing, or corrupted telemetry. During a four-week production deployment, it accurately captured the sole instance of invalid input with zero false positives. Simulations further demonstrate 100% detection of traffic anomalies as small as 5% and sustained zero false positives even under up to 30% telemetry corruption.
📝 Abstract
We present CrossCheck, a system that validates inputs to the Software-Defined Networking (SDN) controller in a Wide Area Network (WAN). By detecting incorrect inputs - often stemming from bugs in the SDN control infrastructure - CrossCheck alerts operators before they trigger network outages. Our analysis at a large-scale WAN operator identifies invalid inputs as a leading cause of major outages, and we show how CrossCheck would have prevented those incidents. We deployed CrossCheck as a shadow validation system for four weeks in a production WAN, during which it accurately detected the single incident of invalid inputs that occurred while sustaining a 0% false positive rate under normal operation, hence imposing little additional burden on operators. In addition, we show through simulation that CrossCheck reliably detects a wide range of invalid inputs (e.g., detecting demand perturbations as small as 5% with 100% accuracy) and maintains a near-zero false positive rate for realistic levels of noisy, missing, or buggy telemetry data (e.g., sustaining zero false positives with up to 30% of corrupted telemetry data).