Are queries and keys always relevant? A case study on Transformer wave functions

📅 2024-05-29
🏛️ Machine Learning: Science and Technology
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the applicability and efficiency of dot-product attention mechanisms for parameterizing ground-state wavefunctions of the two-dimensional $J_1$–$J_2$ Heisenberg model within variational quantum Monte Carlo (VQMC). We observe that standard query-key attention weights become input-agnostic in late optimization stages; theoretical analysis shows such weights can be omitted. Motivated by this, we propose a lightweight, position-driven attention variant that depends solely on spin coordinates—enforcing physically motivated short-range correlation priors while drastically reducing computational overhead. Experiments demonstrate that our approach achieves comparable ground-state energy accuracy (error < 0.1%) with ~40% fewer parameters and significantly lower training cost. Attention maps confirm convergence to static, position-dominated weight patterns. This work establishes an interpretable, low-redundancy paradigm for structured neural-network modeling of quantum many-body systems.

Technology Category

Application Category

📝 Abstract
The dot product attention mechanism, originally designed for natural language processing tasks, is a cornerstone of modern Transformers. It adeptly captures semantic relationships between word pairs in sentences by computing a similarity overlap between queries and keys. In this work, we explore the suitability of Transformers, focusing on their attention mechanisms, in the specific domain of the parametrization of variational wave functions to approximate ground states of quantum many-body spin Hamiltonians. Specifically, we perform numerical simulations on the two-dimensional $J_1$-$J_2$ Heisenberg model, a common benchmark in the field of quantum many-body systems on lattice. By comparing the performance of standard attention mechanisms with a simplified version that excludes queries and keys, relying solely on positions, we achieve competitive results while reducing computational cost and parameter usage. Furthermore, through the analysis of the attention maps generated by standard attention mechanisms, we show that the attention weights become effectively input-independent at the end of the optimization. We support the numerical results with analytical calculations, providing physical insights of why queries and keys should be, in principle, omitted from the attention mechanism when studying large systems.
Problem

Research questions and friction points this paper is trying to address.

Quantum Many-Body Spin Systems
Transformer Architecture
Attention Mechanism
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer Architecture
Quantum Spin Systems
Simplified Attention Mechanism
🔎 Similar Papers
No similar papers found.
R
Riccardo Rende
International School for Advanced Studies, Trieste, Italy
L
Luciano Loris Viteritti
University of Trieste, Trieste, Italy