In-Network Preprocessing of Recommender Systems on Multi-Tenant SmartNICs

๐Ÿ“… 2025-01-21
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the high-cost, power-intensive CPU dependency of online data preprocessing for recommender systems in shared environments, this paper proposes Piper, a network-in-the-loop preprocessing architecture leveraging multi-tenant SmartNICs to offload data loading and feature engineering to edge-network hardware. Our key contributions include: (i) the first reconfigurable MiniPipe pipeline unit enabling dynamic, multi-task deployment; and (ii) the first end-to-end hardware acceleration of the full recommender preprocessing pipeline with fine-grained multi-tenant resource sharing. Implemented on an FPGA-based prototype integrating a streaming processing engine and strict resource isolation, Piper achieves 39โ€“105ร— speedup over a 128-core CPU and 3โ€“17ร— over RTX 3090/A100 GPUs. It significantly reduces preprocessing latency and cuts preprocessing power consumption by over 60%, delivering superior low-latency, high-throughput, and energy-efficient performance.

Technology Category

Application Category

๐Ÿ“ Abstract
Keeping ML-based recommender models up-to-date as data drifts and evolves is essential to maintain accuracy. As a result, online data preprocessing plays an increasingly important role in serving recommender systems. Existing solutions employ multiple CPU workers to saturate the input bandwidth of a single training node. Such an approach results in high deployment costs and energy consumption. For instance, a recent report from industrial deployments shows that data storage and ingestion pipelines can account for over 60% of the power consumption in a recommender system. In this paper, we tackle the issue from a hardware perspective by introducing Piper, a flexible and network-attached accelerator that executes data loading and preprocessing pipelines in a streaming fashion. As part of the design, we define MiniPipe, the smallest pipeline unit enabling multi-pipeline implementation by executing various data preprocessing tasks across the single board, giving Piper the ability to be reconfigured at runtime. Our results, using publicly released commercial pipelines, show that Piper, prototyped on a power-efficient FPGA, achieves a 39$sim$105$ imes$ speedup over a server-grade, 128-core CPU and 3$sim$17$ imes$ speedup over GPUs like RTX 3090 and A100 in multiple pipelines. The experimental analysis demonstrates that Piper provides advantages in both latency and energy efficiency for preprocessing tasks in recommender systems, providing an alternative design point for systems that today are in very high demand.
Problem

Research questions and friction points this paper is trying to address.

Smart Network Interface Card
Recommendation System
Energy Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Piper Hardware Accelerator
In-network Preprocessing
Energy Efficiency
๐Ÿ”Ž Similar Papers
No similar papers found.