🤖 AI Summary
To address the challenges of limited quantum hardware resources and difficulties in cross-platform optimization of classical quantum circuit simulation, this paper introduces CAST—a cross-platform Schrödinger-state quantum circuit simulator toolchain. Methodologically, CAST features: (1) a novel sparsity-aware, hardware-adaptive gate fusion algorithm that dynamically selects optimal fusion strategies and backend targets; and (2) a dual-path compilation architecture that jointly leverages LLVM IR vectorization optimization and PTX code generation, tightly integrating sparse matrix computation optimizations with cross-platform compilation techniques. Experimental results demonstrate that CAST achieves up to 8.03× speedup over Qiskit on 32-qubit CPU benchmarks and up to 39.3× acceleration over cuQuantum on 30-qubit GPU benchmarks. These improvements significantly enhance both the efficiency and portability of large-scale quantum circuit simulation across heterogeneous platforms.
📝 Abstract
While existing quantum hardware resources have limited availability and reliability, there is a growing demand for exploring and verifying quantum algorithms. Efficient classical simulators for high-performance quantum simulation are critical to meeting this demand. However, due to the vastly varied characteristics of classical hardware, implementing hardware-specific optimizations for different hardware platforms is challenging. To address such needs, we propose CAST (Cross-platform Adaptive Schr""odiner-style Simulation Toolchain), a novel compilation toolchain with cross-platform (CPU and Nvidia GPU) optimization and high-performance backend supports. CAST exploits a novel sparsity-aware gate fusion algorithm that automatically selects the best fusion strategy and backend configuration for targeted hardware platforms. CAST also aims to offer versatile and high-performance backend for different hardware platforms. To this end, CAST provides an LLVM IR-based vectorization optimization for various CPU architectures and instruction sets, as well as a PTX-based code generator for Nvidia GPU support. We benchmark CAST against IBM Qiskit, Google QSimCirq, Nvidia cuQuantum backend, and other high-performance simulators. On various 32-qubit CPU-based benchmarks, CAST is able to achieve up to 8.03x speedup than Qiskit. On various 30-qubit GPU-based benchmarks, CAST is able to achieve up to 39.3x speedup than Nvidia cuQuantum backend.