RecFlash: Fast Recommendation System on In-Storage Computing with Frequency-Based Data Mapping

📅 2026-04-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

235K/year
🤖 AI Summary
This work addresses the low bandwidth utilization and inference performance bottlenecks in NAND flash-based in-memory computing architectures caused by irregular random memory accesses inherent to recommendation systems. To mitigate the resulting bandwidth waste, the authors propose a frequency-aware data remapping algorithm that optimizes data placement by mapping frequently accessed data to high-bandwidth regions of the memory array. Specifically tailored for recommendation system inference, this approach preserves model accuracy while significantly improving efficiency: compared to existing NAND in-memory computing solutions, it reduces inference latency by up to 81% and energy consumption by up to 91.9%, thereby enabling highly efficient and low-overhead in-memory inference acceleration.
📝 Abstract
Recommendation system has gained a large popularity for a variety of personalized suggestion tasks, but the ever-increasing number of user data makes real-time processing of recommendation systems difficult. NAND flash memory-based in-storage computing scheme can be one of favorable candidates among the various acceleration approaches because the flash memory typically has a larger memory capacity than the other memory types, so it can efficiently handle a large amount of user data for the recommendation inference services. However, different from other neural network applications where data is sequentially fetched from memory, the recommendation system shows the irregular random memory access pattern. Hence, most of the data loaded from the NAND flash array to the page buffer are not used, so a large portion of the internal bandwidth is underutilized, which degrades the performance on the inference acceleration of the recommendation tasks. In this paper, we propose RecFlash, a fast recommendation inference accelerator utilizing a data remapping algorithm with NAND flash-based in-storage computing (ISC). The experimental results show that our proposed method improves the latency and energy consumption by up to 81% and 91.9%, respectively, over the existing NAND flash-based ISC architecture.
Problem

Research questions and friction points this paper is trying to address.

recommendation system
in-storage computing
NAND flash memory
random memory access
bandwidth underutilization
Innovation

Methods, ideas, or system contributions that make the work stand out.

in-storage computing
recommendation system
NAND flash memory
data remapping
frequency-based mapping
🔎 Similar Papers