🤖 AI Summary
To address the challenges of complex GPU memory management and difficult workload portability in large-scale network traffic analysis, this paper proposes a multi-GPU anonymous network-aware computing framework built upon the C++26 `std::execution` asynchronous programming model. It pioneers the integration of Sender/Receiver semantics into network-aware graph computation, elevating GPUs to first-class execution entities and enabling cross-device composable asynchronous scheduling with fine-grained memory control. The framework constructs task chains using standardized execution policies, balancing developer productivity and peak performance. Evaluated on an 8-GPU NVIDIA A100 system, it achieves up to 55× speedup over a serial GraphBLAS baseline, significantly improving parallel efficiency and scalability. This work establishes a novel paradigm for real-time network analysis in high-density GPU environments.
📝 Abstract
Large-scale network sensing plays a vital role in network traffic analysis and characterization. As network packet data grows increasingly large, parallel methods have become mainstream for network analytics. While effective, GPU-based implementations still face start-up challenges in host-device memory management and porting complex workloads on devices, among others. To mitigate these challenges, composable frameworks have emerged using modern C++ programming language, for efficiently deploying analytics tasks on GPUs. Specifically, the recent C++26 Senders model of asynchronous data operation chaining provides a simple interface for bulk pushing tasks to varied device execution contexts.
Considering the prominence of contemporary dense-GPU platforms and vendor-leveraged software libraries, such a programming model consider GPUs as first-class execution resources (compared to traditional host-centric programming models), allowing convenient development of multi-GPU application workloads via expressive and standardized asynchronous semantics. In this paper, we discuss practical aspects of developing the Anonymized Network Sensing Graph Challenge on dense-GPU systems using the recently proposed C++26 Senders model. Adopting a generic and productive programming model does not necessarily impact the critical-path performance (as compared to low-level proprietary vendor-based programming models): our commodity library-based implementation achieves up to 55x performance improvements on 8x NVIDIA A100 GPUs as compared to the reference serial GraphBLAS baseline.