On the Performance of Cloud-based ARM SVE for Zero-Knowledge Proving Systems

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant performance gap of ARM-based cloud servers versus x86-64 under zero-knowledge proof (ZKP) workloads—particularly in critical paths such as Merkle tree construction. We systematically identify the root cause: the 128-bit vector width and lower clock frequencies inherent to current ARM implementations. To overcome this, we propose and empirically validate a 512-bit scalable vector extension (SVE/SVE2)-based acceleration framework, integrating optimized Poseidon hashing, Goldilocks field arithmetic, and parallelized Merkle tree construction. On AWS Graviton4 and Axion instances, our approach achieves a 1.4–1.6× speedup over AVX-512-optimized x86-64 baselines. Theoretical analysis confirms that 512-bit SVE can surpass x86-64 in ZKP throughput while sustaining >10% cost advantage. This work establishes the first systematic performance modeling and vectorization optimization paradigm for ARM architectures in production-scale ZKP infrastructure.

Technology Category

Application Category

📝 Abstract
Zero-knowledge proofs (ZKP) are becoming a gold standard in scaling blockchains and bringing Web3 to life. At the same time, ZKP for transactions running on the Ethereum Virtual Machine require powerful servers with hundreds of CPU cores. The current zkProver implementation from Polygon is optimized for x86-64 CPUs by vectorizing key operations, such as Merkle tree building with Poseidon hashes over the Goldilocks field, with Advanced Vector Extensions (AVX and AVX512). With these optimizations, a ZKP for a batch of transactions is generated in less than two minutes. With the advent of cloud servers with ARM which are at least 10% cheaper than x86-64 servers and the implementation of ARM Scalable Vector Extension (SVE), we wonder if ARM servers can take over their x86-64 counterparts. Unfortunately, our analysis shows that current ARM CPUs are not a match for their x86-64 competitors. Graviton4 from Amazon Web Services (AWS) and Axion from Google Cloud Platform (GCP) are 1.6X and 1.4X slower compared to the latest AMD EPYC and Intel Xeon servers from AWS with AVX and AVX512, respectively, when building a Merkle tree with over four million leaves. This low performance is due to (1) smaller vector size in these ARM CPUs (128 bits versus 512 bits in AVX512) and (2) lower clock frequency. On the other hand, ARM SVE/SVE2 Instruction Set Architecture (ISA) is at least as powerful as AVX/AVX512 but more flexible. Moreover, we estimate that increasing the vector size to 512 bits will enable higher performance in ARM CPUs compared to their x86-64 counterparts while maintaining their price advantage.
Problem

Research questions and friction points this paper is trying to address.

Evaluating ARM SVE performance for ZKP systems
Comparing ARM and x86-64 servers for ZKP efficiency
Identifying ARM CPU limitations in Merkle tree operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

ARM SVE for ZKP performance enhancement
Vectorized Merkle tree with Poseidon hashes
512-bit ARM SVE for cost efficiency
🔎 Similar Papers
No similar papers found.