π€ AI Summary
To address memory bandwidth bottlenecks and inefficient near-memory processing (NMP) caused by irregular memory access patterns in hardware acceleration of fully homomorphic encryption (FHE), this paper proposes FlexMemβa highly parallel near-memory architecture. Our approach introduces: (1) a configurable near-memory computing architecture supporting flexible dataflows, featuring variable-stride memory access and dynamically reconfigurable heterogeneous interconnect topologies; and (2) a dual-granularity scheduling mechanism coordinating polynomial-level and ciphertext-level execution to match FHEβs inherent irregular computation and memory access behavior. Experimental evaluation demonstrates that FlexMem achieves 1.12Γ higher performance than the state-of-the-art NMP accelerators for FHE, while attaining a near-memory bandwidth utilization of 95.7%, thereby significantly alleviating the memory wall limitation in FHE acceleration.
π Abstract
Fully Homomorphic Encryption (FHE) imposes substantial memory bandwidth demands, presenting significant challenges for efficient hardware acceleration. Near-memory Processing (NMP) has emerged as a promising architectural solution to alleviate the memory bottleneck. However, the irregular memory access patterns and flexible dataflows inherent to FHE limit the effectiveness of existing NMP accelerators, which fail to fully utilize the available near-memory bandwidth. In this work, we propose FlexMem, a near-memory accelerator featuring high-parallel computational units with varying memory access strides and interconnect topologies to effectively handle irregular memory access patterns. Furthermore, we design polynomial and ciphertext-level dataflows to efficiently utilize near-memory bandwidth under varying degrees of polynomial parallelism and enhance parallel performance. Experimental results demonstrate that FlexMem achieves 1.12 times of performance improvement over state-of-the-art near-memory architectures, with 95.7% of near-memory bandwidth utilization.