Quantum Doubly Stochastic Transformers

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This work addresses training instability in Transformers caused by Softmax normalization. We propose the Quantum Doubly Stochastic Transformer (QDSFormer), which replaces the Softmax operation in self-attention layers with a differentiable variational quantum circuit that directly generates parameterized doubly stochastic matrices (DSMs). To our knowledge, this is the first integration of quantum circuits as learnable DSM generators in Transformers, introducing quantum inductive biases with no classical analog—thereby enhancing matrix diversity and information preservation. Our approach adopts a hybrid classical-quantum architecture and draws inspiration from QR decomposition to design an efficient quantum implementation. Experiments on multiple few-shot vision benchmarks demonstrate that QDSFormer significantly outperforms standard Vision Transformers (ViTs) and Sinkformer baselines, exhibiting improved training stability and lower performance variance. These results validate both the effectiveness and generalization advantage of the quantum DSM mechanism.

Technology Category

Application Category

📝 Abstract

At the core of the Transformer, the Softmax normalizes the attention matrix to be right stochastic. Previous research has shown that this often destabilizes training and that enforcing the attention matrix to be doubly stochastic (through Sinkhorn's algorithm) consistently improves performance across different tasks, domains and Transformer flavors. However, Sinkhorn's algorithm is iterative, approximative, non-parametric and thus inflexible w.r.t. the obtained doubly stochastic matrix (DSM). Recently, it has been proven that DSMs can be obtained with a parametric quantum circuit, yielding a novel quantum inductive bias for DSMs with no known classical analogue. Motivated by this, we demonstrate the feasibility of a hybrid classical-quantum doubly stochastic Transformer (QDSFormer) that replaces the Softmax in the self-attention layer with a variational quantum circuit. We study the expressive power of the circuit and find that it yields more diverse DSMs that better preserve information than classical operators. Across multiple small-scale object recognition tasks, we find that our QDSFormer consistently surpasses both a standard Vision Transformer and other doubly stochastic Transformers. Beyond the established Sinkformer, this comparison includes a novel quantum-inspired doubly stochastic Transformer (based on QR decomposition) that can be of independent interest. The QDSFormer also shows improved training stability and lower performance variation suggesting that it may mitigate the notoriously unstable training of ViTs on small-scale data.

Problem

Research questions and friction points this paper is trying to address.

Replacing Softmax with quantum circuit for doubly stochastic attention

Improving Transformer performance and training stability via quantum DSM

Exploring quantum-inspired inductive bias for diverse attention matrices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Replaces Softmax with variational quantum circuit

Generates diverse doubly stochastic matrices

Improves training stability and performance

🔎 Similar Papers

Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention

2024-03-21arXiv.orgCitations: 0

A mathematical perspective on Transformers

2023-12-17Bulletin of the American Mathematical SocietyCitations: 59

💼 Related Jobs

No related jobs found.

Authors to Follow