🤖 AI Summary
Existing benchmarks lack systematic evaluation of Data Processing Unit (DPU) capabilities for data-intensive workloads. Method: We propose the first DPU-specific benchmark suite tailored for data processing, built upon a scalable abstraction framework that uniquely supports heterogeneous DPU architectures and multiple data processing stacks—including network I/O, memory bandwidth, coprocessor acceleration, and storage offloading—enabling cross-platform, modular performance assessment. Contribution/Results: Evaluated across mainstream DPU platforms, our suite demonstrates 1.8×–5.3× throughput improvements over CPUs on representative workloads such as query processing, compression, and encryption. It is the first to quantitatively characterize the performance benefits and fundamental bottlenecks of DPU offloading, thereby establishing a rigorous foundation for DPU data-processing evaluation and closing a critical gap in the systems benchmarking landscape.
📝 Abstract
Data processing units (DPUs, SoC-based SmartNICs) are emerging data center hardware that provide opportunities to address cloud data processing challenges. Their onboard compute, memory, network, and auxiliary storage can be leveraged to offload a variety of data processing tasks. Although recent work shows promising benefits of DPU offloading for specific operations, a comprehensive view of the implications of DPUs for data processing is missing. Benchmarking can help, but existing benchmark tools lack the focus on data processing and are limited to specific DPUs. In this paper, we present dpBento, a benchmark suite that aims to uncover the performance characteristics of different DPU resources and different DPUs, and the performance implications of offloading a wide range of data processing operations and systems to DPUs. It provides an abstraction for automated performance testing and reporting and is easily extensible. We use dpBento to measure recent DPUs, present our benchmarking results, and highlight insights into the potential benefits of DPU offloading for data processing.