NASiC: 3D NAND-based CAM-Selected Multibit CIM Architecture for Efficient On-Device Mixture-of-Experts LLM Inference

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

238K/year
🤖 AI Summary
This work addresses the memory bottleneck and the inefficiency of 3D NAND-based in-memory computing in supporting dynamic sparse expert activation when deploying Mixture-of-Experts (MoE) large language models on edge devices. To overcome these challenges, the authors propose NASiC, a novel architecture that uniquely integrates content-addressable memory (CAM) with multi-bit 3D NAND in-memory computing. NASiC leverages a CAM-based gating mechanism to enable dynamic expert selection and activation computation within a single cycle, while employing block-level parallelism and in-situ signed multi-bit expansion to significantly enhance parallelism and memory utilization. Experimental results demonstrate that, compared to existing approaches, NASiC achieves 4–114.8× higher performance and 3.9–70× better energy efficiency while maintaining high inference accuracy, offering a promising pathway for efficient edge deployment of MoE models.
📝 Abstract
The Mixture-of-Experts (MoE) models have emerged as the state-of-the-art paradigm for scaling up large language models (LLMs) without proportionally increased computational cost. However, its on-device deployment faces a critical challenge due to the large memory requirement for storing all expert parameters. 3D NAND-based computing-in-memory (CIM) architectures uniquely offer high storage capacity and reduced data movement, while they are ill-suited for MoE models with dynamically sparse expert activation, leading to a degradation of effective computational parallelism, along with underutilization of multibit storage capability of Flash cells. In this work, we proposed a 3D NAND-based content addressable-selected CIM architecture, dubbed as NASiC, which is tailored to MoE models. By leveraging the intrinsic string structure of 3D NAND technology, NASiC fuses the dynamical expert selection through CAM-based masking mechanism and activated expert computation through CIM into a single computation cycle, eradicating redundant computation and enhancing computational parallelism. Moreover, circuit-level optimizations and multibit CIM cell are co-designed with proposed NASiC architecture, featuring block-wise parallel computation with in-situ signed multibit input and weight expansion, substantially improving the throughput and energy-efficiency of NAND CIM array, as well as the utilization of high-density 3D NAND technology for MoE models. With extensive experimental results, we demonstrate NASiC achieves 4-114.8x improved performance and 3.9-70x improved energy efficiency over state-of-the-art designs, along with high accuracy, showing its great potential for efficient on-device MoE LLM inference.
Problem

Research questions and friction points this paper is trying to address.

Mixture-of-Experts
3D NAND
Computing-in-Memory
On-Device Inference
Multibit Storage
Innovation

Methods, ideas, or system contributions that make the work stand out.

Computing-in-Memory
3D NAND Flash
Mixture-of-Experts
Content-Addressable Memory
Multibit CIM
Weikai Xu
Weikai Xu
Department Communication Engineering, Xiamen University
Chaos CommunicationsWireless Communications
Meng Li
Meng Li
Peking University; Ex-Facebook
Efficient AIPrivacy Preserving AI
Shuzhang Zhong
Shuzhang Zhong
Peking University
Machine Learning System
T
Tianyang Luo
School of Integrated Circuits, Peking University, Beijing, China
D
Dongxue Zhao
School of Integrated Circuits, Peking University, Beijing, China
Ling Liang
Ling Liang
pku.edu.cn
Zongwei Wang
Zongwei Wang
Peking University
MicroelectronicsMemory TechnologyComputing-in-MemoryNeuromorphic Computing
Qianqian Huang
Qianqian Huang
Institute of Microelectronics, Peking University
Microelectronics
Y
Yimao Cai
School of Integrated Circuits, Peking University, Beijing, China; Beijing Advanced Innovation Center for Integrated Circuits, Beijing, China
R
Ru Huang
School of Integrated Circuits, Peking University, Beijing, China; Beijing Advanced Innovation Center for Integrated Circuits, Beijing, China