Accelerating Elliptic Curve Point Additions on Versal AI Engine for Multi-scalar Multiplication

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the computational bottleneck of multi-scalar multiplication (MSM) in zero-knowledge proofs (ZKPs). To accelerate MSM for 377-bit elliptic curve cryptography—characterized by high-precision arithmetic and strong data dependencies—we propose a hardware acceleration framework tailored to the Xilinx Versal ACAP platform. We introduce, for the first time, an AI Engine (AIE)-optimized point addition (PADD) design featuring a carry-save–style carry-propagation algorithm aligned with VLIW/SIMD instruction-level parallelism, and systematically evaluate four spatial mapping strategies to maximize on-chip task parallelism. Leveraging custom assembly programming, optimized arbitrary-precision arithmetic, and fine-grained memory access scheduling, our implementation achieves 50.2% of theoretical memory bandwidth utilization—568× higher than a state-of-the-art CPU baseline. This establishes a new performance benchmark for ZKP hardware acceleration.

Technology Category

Application Category

📝 Abstract
Multi-scalar multiplication (MSM) is crucial in cryptographic applications and computationally intensive in zero-knowledge proofs. MSM involves accumulating the products of scalars and points on an elliptic curve over a 377-bit modulus, and the Pippenger algorithm converts MSM into a series of elliptic curve point additions (PADDs) with high parallelism. This study investigates accelerating MSM on the Versal ACAP platform, an emerging hardware that employs a spatial architecture integrating 400 AI Engines (AIEs) with programmable logic and a processing system. AIEs are SIMD-based VLIW processors capable of performing vector multiply-accumulate operations, making them well-suited for multiplication-heavy workloads in PADD. Unlike simpler multiplication tasks in previous studies, cryptographic computations also require complex operations such as carry propagation. These operations necessitate architecture-aware optimizations, including intra-core dedicated coding style to fully exploit VLIW capabilities and inter-core strategy for spatial task mapping. We propose various optimizations to accelerate PADDs, including (1) algorithmic optimizations for carry propagation employing a carry-save-like technique to exploit VLIW and SIMD capabilities and (2) a comparison of four distinct spatial mappings to enhance intra- and inter-task parallelism. Our approach achieves a computational efficiency that utilizes 50.2% of the theoretical memory bandwidth and provides 568 speedup over the integrated CPU on the AIE evaluation board.
Problem

Research questions and friction points this paper is trying to address.

Accelerate elliptic curve point additions
Optimize multi-scalar multiplication
Enhance parallelism on Versal ACAP platform
Innovation

Methods, ideas, or system contributions that make the work stand out.

Versal AI Engine optimization
Carry-save technique application
Spatial task mapping enhancement
🔎 Similar Papers
No similar papers found.