🤖 AI Summary
Current quantum high-performance computing systems lack effective fault-tolerance and recovery mechanisms, and conventional checkpointing approaches based on quantum state preservation are fundamentally hindered by the no-cloning theorem. This work proposes a novel algorithm-level fault-tolerance framework that reframes checkpointing and recovery as problems of control flow and algorithmic state management, thereby avoiding direct storage of quantum states. Instead, it leverages mid-circuit measurements, classical feedforward, and conditional operations in dynamic quantum circuits to efficiently capture and restore program execution states. The approach enables reliable interruption and resumption for representative iterative or staged quantum algorithms—including variational eigensolvers, the Quantum Approximate Optimization Algorithm (QAOA), and time-stepping simulations—significantly enhancing the reliability and restartability of quantum computational tasks.
📝 Abstract
In this work, we explore the design of the checkpointing and restoration for quantum HPC that leverages dynamic circuit technology to enable restartable and resilient quantum execution. Rather than attempting to checkpoint quantum states, our approach redefines checkpointing as a control flow and algorithmic state problem. By exploiting mid-circuit measurements, classical feed forward, and conditional execution supported by dynamic circuits, we capture sufficient program state to allow correct restoration of quantum workflows after interruption or failure. This design aligns naturally with iterative and staged quantum algorithms such as variational eigensolvers, quantum approximate optimization, and time-stepping methods commonly used in quantum simulation and scientific computing.