🤖 AI Summary
This work proposes a deep learning–based error mitigation method to address inaccuracies in the output distributions of noisy quantum circuits. Leveraging a sequence-to-sequence attention model, the approach integrates multidimensional information—including quantum circuit structure, device-specific characteristics, and observed noisy outputs—to reconstruct distributions that better approximate the ideal noiseless case. The study presents the first systematic evaluation of various deep learning architectures for this task, identifying Transformer-based models as the most effective. Notably, the method demonstrates strong generalization capabilities, supporting transfer learning across different circuit families and quantum hardware platforms. Experiments on both simulated data and real measurements from IBM’s five-qubit superconducting quantum processor show consistent and significant improvements over existing baselines, with stable performance across varying circuit depths and successful cross-device adaptation without requiring full retraining.
📝 Abstract
We present a systematic investigation of deep learning methods applied to quantum error mitigation of noisy output probability distributions from measured quantum circuits. We compare different architectures, from fully connected neural networks to transformers, and we test different design/training modalities, identifying sequence-to-sequence, attention-based models as the most effective on our datasets. These models consistently produce mitigated distributions that are closer to the ideal outputs when tested on both simulated and real device data obtained from IBM superconducting quantum processing units (QPU) up to five qubits. Across several different circuit depths, our approach outperforms other baseline error mitigation techniques. We perform a series of ablation studies to examine: how different input features (circuit, device properties, noisy output statistics) affect performance; cross-dataset generalization across circuit families; and transfer learning to a different IBM QPU. We observe that generalization performance across similar devices with the same architecture works effectively, without needing to fully retrain models.