🤖 AI Summary
This work addresses the high memory overhead of conventional large language models (LLMs) and the limited scalability of existing quantum-enhanced approaches on real hardware. The authors propose embedding Cayley-parameterized unitary adapters into the frozen projection layers of LLMs, enabling end-to-end inference on IBM Quantum System Two—a 156-qubit superconducting processor. With only approximately 6,000 additional parameters, the method reduces perplexity by 1.4% on Llama 3.1 8B and recovers 83% of the performance loss in SmolLM2 due to compression, successfully answering questions that classical baselines fail to address. This study presents the first demonstration of quantum enhancement for billion-parameter-scale models on actual quantum hardware and identifies a critical phase transition between noise and expressivity, offering a viable pathway toward practical quantum-augmented language modeling.
📝 Abstract
Large language models (LLMs) have transformed artificial intelligence, yet classical architectures impose a fundamental constraint: every trainable parameter demands classical memory that scales unfavourably with model size. Quantum computing offers a qualitatively different pathway, but practical demonstrations on real hardware have remained elusive for models of practical relevance. Here we show that Cayley-parameterised unitary adapters -- quantum circuit blocks inserted into the frozen projection layers of pre-trained LLMs and executed on a 156-qubit IBM Quantum System Two superconducting processor -- improve the perplexity of Llama 3.1 8B, an 8-billion-parameter model in widespread use, by 1.4% with only 6,000 additional parameters and end-to-end inference validated on real Quantum Processing Unit (QPU). A systematic study on SmolLM2 (135M parameters), chosen for its tractability, reveals monotonically improving perplexity with unitary block dimension, 83% recovery of compression-induced degradation, and correct answers to questions that both classical baselines fail -- with a sharp noise-expressivity phase transition identifying the concrete path to quantum utility at larger qubit scales.