🤖 AI Summary
Traditional arbitrary-precision arithmetic algorithms exhibit strong data dependencies that hinder effective exploitation of modern CPUs’ SIMD parallelism. This work proposes DigitsOnTurbo (DoT), a novel approach that restructures the dataflow and computation patterns of big-number operations to transform them into genuinely independent, data-parallel tasks—rather than merely vectorizing conventional algorithms. By fully leveraging SIMD instruction sets and aligning with contemporary CPU architectures, DoT achieves speedups of up to 1.85× and 2.3× for addition/subtraction and multiplication, respectively. When integrated into mainstream libraries, it improves end-to-end throughput by 19.3% in scientific computing workloads and reduces latency by 7.9% while increasing throughput by 5.9% in cryptographic applications.
📝 Abstract
Large-number arithmetic, widely used in scientific computing and cryptography, has seen limited adoption of single instruction, multiple data (SIMD) parallelism on modern CPUs due to the inherent dependencies in traditional algorithms. We present DigitsOnTurbo (DoT), which restructures the computation around independent, data-parallel operations, rather than vectorizing the standard algorithms, thereby leveraging the benefits provided by SIMD. Over prior SIMD implementations, DoT achieves up to 1.85x speedups for addition and subtraction, and 2.3x for multiplication. When integrated into state-of-the-art libraries, DoT yields up to 4x speedup for addition and subtraction, and up to 2x speedup for multiplication, cascading into end-to-end throughput gains of up to 19.3% for scientific computations, and up to 7.9% latency and 5.9% throughput improvements on cryptographic implementations.