FlexMem: High-Parallel Near-Memory Architecture for Flexible Dataflow in Fully Homomorphic Encryption

πŸ“… 2025-03-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address memory bandwidth bottlenecks and inefficient near-memory processing (NMP) caused by irregular memory access patterns in hardware acceleration of fully homomorphic encryption (FHE), this paper proposes FlexMemβ€”a highly parallel near-memory architecture. Our approach introduces: (1) a configurable near-memory computing architecture supporting flexible dataflows, featuring variable-stride memory access and dynamically reconfigurable heterogeneous interconnect topologies; and (2) a dual-granularity scheduling mechanism coordinating polynomial-level and ciphertext-level execution to match FHE’s inherent irregular computation and memory access behavior. Experimental evaluation demonstrates that FlexMem achieves 1.12Γ— higher performance than the state-of-the-art NMP accelerators for FHE, while attaining a near-memory bandwidth utilization of 95.7%, thereby significantly alleviating the memory wall limitation in FHE acceleration.

Technology Category

Application Category

πŸ“ Abstract
Fully Homomorphic Encryption (FHE) imposes substantial memory bandwidth demands, presenting significant challenges for efficient hardware acceleration. Near-memory Processing (NMP) has emerged as a promising architectural solution to alleviate the memory bottleneck. However, the irregular memory access patterns and flexible dataflows inherent to FHE limit the effectiveness of existing NMP accelerators, which fail to fully utilize the available near-memory bandwidth. In this work, we propose FlexMem, a near-memory accelerator featuring high-parallel computational units with varying memory access strides and interconnect topologies to effectively handle irregular memory access patterns. Furthermore, we design polynomial and ciphertext-level dataflows to efficiently utilize near-memory bandwidth under varying degrees of polynomial parallelism and enhance parallel performance. Experimental results demonstrate that FlexMem achieves 1.12 times of performance improvement over state-of-the-art near-memory architectures, with 95.7% of near-memory bandwidth utilization.
Problem

Research questions and friction points this paper is trying to address.

Addresses high memory bandwidth demands in Fully Homomorphic Encryption
Improves irregular memory access patterns in Near-Memory Processing
Enhances near-memory bandwidth utilization for flexible dataflows
Innovation

Methods, ideas, or system contributions that make the work stand out.

High-parallel units with varying memory strides
Polynomial and ciphertext-level dataflows design
Flexible interconnect topologies for irregular accesses
πŸ”Ž Similar Papers
S
Shangyi Shi
SKLP , Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences; Cambricon Technologies
Husheng Han
Husheng Han
Institute of Computing Technology, Chinese Academy of Sciences
Computer architectureSecurityDNNDomain-Specific Accelerator
Jianan Mu
Jianan Mu
Institute of Computing Technology, State Key Laboratory of Processors (SKLP), CAS
Design AutomationAccelaretorPrivacy Preserving Computing
Xinyao Zheng
Xinyao Zheng
University of California Riverside
Ling Liang
Ling Liang
pku.edu.cn
H
Hang Lu
SKLP , Institute of Computing Technology, Chinese Academy of Sciences; Shanghai Innovation Center for Processor Technologies, SHIC
Z
Zidong Du
SKLP , Institute of Computing Technology, Chinese Academy of Sciences; Shanghai Innovation Center for Processor Technologies, SHIC
X
Xiaowei Li
SKLP , Institute of Computing Technology, Chinese Academy of Sciences; CASTEST
X
Xing Hu
SKLP , Institute of Computing Technology, Chinese Academy of Sciences; Shanghai Innovation Center for Processor Technologies, SHIC
Q
Qi Guo
SKLP , Institute of Computing Technology, Chinese Academy of Sciences