Ironman: Accelerating Oblivious Transfer Extension for Privacy-Preserving AI with Near-Memory Processing

๐Ÿ“… 2025-07-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Privacy-preserving machine learning (PPML) relies on oblivious transfer (OT) for secure nonlinear function evaluation, yet CPU-based OT implementations constitute a critical end-to-end latency bottleneck. This work proposes a hardwareโ€“software co-design: a customized accelerator for single-point correlated OT (SPCOT), integrated with a near-memory processing (NMP) architecture featuring memory-side caching and index sorting to optimize irregular memory access patterns; additionally, it employs a hardware-friendly SPCOT-LPN joint algorithm. Experimental results across multiple NMP configurations demonstrate 39.2ร—โ€“237.4ร— higher OT throughput and 2.1ร—โ€“3.4ร— reduced end-to-end latency for CNN and Transformer models. To the best of our knowledge, this is the first work to deeply integrate SPCOT hardware acceleration with NMP, effectively alleviating the long-standing OT performance bottleneck in PPML.

Technology Category

Application Category

๐Ÿ“ Abstract
With the wide application of machine learning (ML), privacy concerns arise with user data as they may contain sensitive information. Privacy-preserving ML (PPML) based on cryptographic primitives has emerged as a promising solution in which an ML model is directly computed on the encrypted data to provide a formal privacy guarantee. However, PPML frameworks heavily rely on the oblivious transfer (OT) primitive to compute nonlinear functions. OT mainly involves the computation of single-point correlated OT (SPCOT) and learning parity with noise (LPN) operations. As OT is still computed extensively on general-purpose CPUs, it becomes the latency bottleneck of modern PPML frameworks. In this paper, we propose a novel OT accelerator, dubbed Ironman, to significantly increase the efficiency of OT and the overall PPML framework. We observe that SPCOT is computation-bounded, and thus propose a hardware-friendly SPCOT algorithm with a customized accelerator to improve SPCOT computation throughput. In contrast, LPN is memory-bandwidth-bounded due to irregular memory access patterns. Hence, we further leverage the near-memory processing (NMP) architecture equipped with memory-side cache and index sorting to improve effective memory bandwidth. With extensive experiments, we demonstrate Ironman achieves a 39.2-237.4 times improvement in OT throughput across different NMP configurations compared to the full-thread CPU implementation. For different PPML frameworks, Ironman demonstrates a 2.1-3.4 times reduction in end-to-end latency for both CNN and Transformer models.
Problem

Research questions and friction points this paper is trying to address.

Accelerating oblivious transfer for privacy-preserving AI
Reducing latency bottleneck in PPML frameworks
Optimizing SPCOT and LPN computations with hardware
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hardware-friendly SPCOT algorithm for computation efficiency
Near-memory processing to improve memory bandwidth
Customized accelerator for OT throughput improvement
๐Ÿ”Ž Similar Papers
No similar papers found.
C
Chenqi Lin
Peking University
K
Kang Yang
State Key Laboratory of Cryptology
T
Tianshi Xu
Peking University
Ling Liang
Ling Liang
pku.edu.cn
Y
Yufei Wang
Alibaba Group
Zhaohui Chen
Zhaohui Chen
Peking University
Applied CryptographyComputer Architectures
R
Runsheng Wang
Peking University
Mingyu Gao
Mingyu Gao
Tsinghua University
Computer ArchitectureMemory SystemsHardware SecurityDomain-Specific Acceleration
M
Meng Li
Peking University