Quantum-enhanced Large Language Models on Quantum Hardware via Cayley Unitary Adapters

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

240K/year
🤖 AI Summary
This work addresses the high memory overhead of conventional large language models (LLMs) and the limited scalability of existing quantum-enhanced approaches on real hardware. The authors propose embedding Cayley-parameterized unitary adapters into the frozen projection layers of LLMs, enabling end-to-end inference on IBM Quantum System Two—a 156-qubit superconducting processor. With only approximately 6,000 additional parameters, the method reduces perplexity by 1.4% on Llama 3.1 8B and recovers 83% of the performance loss in SmolLM2 due to compression, successfully answering questions that classical baselines fail to address. This study presents the first demonstration of quantum enhancement for billion-parameter-scale models on actual quantum hardware and identifies a critical phase transition between noise and expressivity, offering a viable pathway toward practical quantum-augmented language modeling.
📝 Abstract
Large language models (LLMs) have transformed artificial intelligence, yet classical architectures impose a fundamental constraint: every trainable parameter demands classical memory that scales unfavourably with model size. Quantum computing offers a qualitatively different pathway, but practical demonstrations on real hardware have remained elusive for models of practical relevance. Here we show that Cayley-parameterised unitary adapters -- quantum circuit blocks inserted into the frozen projection layers of pre-trained LLMs and executed on a 156-qubit IBM Quantum System Two superconducting processor -- improve the perplexity of Llama 3.1 8B, an 8-billion-parameter model in widespread use, by 1.4% with only 6,000 additional parameters and end-to-end inference validated on real Quantum Processing Unit (QPU). A systematic study on SmolLM2 (135M parameters), chosen for its tractability, reveals monotonically improving perplexity with unitary block dimension, 83% recovery of compression-induced degradation, and correct answers to questions that both classical baselines fail -- with a sharp noise-expressivity phase transition identifying the concrete path to quantum utility at larger qubit scales.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Quantum Computing
Memory Scaling
Quantum Hardware
Parameter Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

quantum-enhanced LLMs
Cayley unitary adapters
quantum hardware
perplexity improvement
noise-expressivity phase transition
🔎 Similar Papers