🤖 AI Summary
Executing large-scale quantum circuits on NISQ-era hardware remains infeasible, while classical full-state-vector simulation suffers from exponential memory and computational overhead. Method: This work presents the first systematic, quantitative comparison—within a unified GPU cluster environment—between CutQC circuit cutting and Qiskit-Aer-GPU’s distributed full-circuit simulation. Contribution/Results: On a single node, full-circuit simulation outperforms CutQC by 1.8–3.2×; however, beyond 45 qubits and across multiple nodes, CutQC demonstrates superior scalability due to bounded communication overhead and slower latency growth. The approach innovatively integrates CUDA-accelerated state-vector evolution, MPI/GPU-aware memory co-scheduling, and circuit partitioning with reconstruction. This yields a novel distributed simulation paradigm for NISQ algorithm verification that balances high fidelity with practical efficiency.
📝 Abstract
Executing large quantum circuits is not feasible using the currently available NISQ (noisy intermediate-scale quantum) devices. The high costs of using real quantum devices make it further challenging to research and develop quantum algorithms. As a result, performing classical simulations is usually the preferred method for researching and validating large-scale quantum algorithms. However, these simulations require a huge amount of resources, as each additional qubit exponentially increases the computational space required. Distributed Quantum Computing (DQC) is a promising alternative to reduce the resources required for simulating large quantum algorithms at the cost of increased runtime. This study presents a comparative analysis of two simulation methods: circuit-splitting and full-circuit execution using distributed memory, each having a different type of overhead. The first method, using CutQC, cuts the circuit into smaller subcircuits and allows us to simulate a large quantum circuit on smaller machines. The second method, using Qiskit-Aer-GPU, distributes the computational space across a distributed memory system to simulate the entire quantum circuit. Results indicate that full-circuit executions are faster than circuit-splitting for simulations performed on a single node. However, circuit-splitting simulations show promising results in specific scenarios as the number of qubits is scaled.