🤖 AI Summary
This work proposes Conduit, a compiler-runtime framework that enables programmer-transparent, unified near-data processing (NDP) on SSDs by efficiently harnessing heterogeneous on-device compute resources. Existing NDP approaches typically support only one or two computation paradigms, limiting resource utilization and requiring explicit programmer intervention. Conduit addresses these limitations through an LLVM compiler extension that automatically vectorizes code at compile time and embeds instruction-level metadata. At runtime, it leverages a six-dimensional feature space and a cost model to dynamically schedule workloads across three NDP paradigms—In-Storage Processing (ISP), Processing-using-DRAM SSD (PuD-SSD), and In-Flash Processing (IFP)—at fine granularity. Evaluated on six data-intensive benchmarks, Conduit achieves an average 1.8× performance speedup and 46% energy reduction over the best-performing baseline, demonstrating the first transparent, general-purpose NDP capability for SSDs.
📝 Abstract
Solid-state drives (SSDs) are well suited for near-data processing (NDP) because they: (1) store large application datasets, and (2) support three NDP paradigms: in-storage processing (ISP), processing using DRAM in the SSD (PuD-SSD), and in-flash processing (IFP). A large body of prior SSD-based NDP techniques operate in isolation, mapping computations to only one or two NDP paradigms (i.e., ISP, PuD-SSD, or IFP) within the SSD. These techniques (1) are tailored to specific workloads or kernels, (2) do not exploit the full computational potential of an SSD, and (3) lack programmer-transparency. While several prior works propose techniques to partition computation between the host and near-memory accelerators, adapting these techniques to SSDs has limited benefits because they (1) ignore the heterogeneity of the SSD resources, and (2) make offloading decisions based on limited factors such as bandwidth utilization, or data movement cost. We propose Conduit, a general-purpose, programmer-transparent NDP framework for SSDs that leverages multiple SSD computation resources. At compile time, Conduit executes a custom compiler (e.g., LLVM) pass that (i) vectorizes suitable application code segments into SIMD operations that align with the SSD's page layout, and (ii) embeds metadata (e.g., operation type, operand sizes) into the vectorized instructions to guide runtime offloading decisions. At runtime, within the SSD, Conduit performs instruction-granularity offloading by evaluating six key features, and uses a cost function to select the most suitable SSD resource. We evaluate Conduit and two prior NDP offloading techniques using an in-house event-driven SSD simulator on six data-intensive workloads. Conduit outperforms the best-performing prior offloading policy by 1.8x and reduces energy consumption by 46%.