Compositional Concept Generalization with Variational Quantum Circuits

📅 2025-09-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current AI models—particularly vision-language models (VLMs)—exhibit significant limitations in compositional generalization, and conventional tensor-based semantic modeling approaches yield suboptimal performance. Method: This work introduces variational quantum circuits (VQCs) to image captioning for the first time, leveraging the expressive power of Hilbert space to enable compositional concept generalization. We propose a quantum image representation that integrates multi-hot encoding with CLIP-inspired angular and amplitude encodings, and exploit the efficient trainability of VQCs to optimize semantic composition learning. Contribution/Results: Experiments demonstrate that our model substantially outperforms classical compositional baselines under noisy multi-hot encoding, exhibiting robust generalization; it also surpasses baseline methods when fed CLIP vector inputs. This study pioneers a quantum machine learning framework for semantic compositional generalization, offering a empirically validated, quantum-enhanced pathway to overcome fundamental cognitive limitations in AI.

Technology Category

Application Category

📝 Abstract
Compositional generalization is a key facet of human cognition, but lacking in current AI tools such as vision-language models. Previous work examined whether a compositional tensor-based sentence semantics can overcome the challenge, but led to negative results. We conjecture that the increased training efficiency of quantum models will improve performance in these tasks. We interpret the representations of compositional tensor-based models in Hilbert spaces and train Variational Quantum Circuits to learn these representations on an image captioning task requiring compositional generalization. We used two image encoding techniques: a multi-hot encoding (MHE) on binary image vectors and an angle/amplitude encoding on image vectors taken from the vision-language model CLIP. We achieve good proof-of-concept results using noisy MHE encodings. Performance on CLIP image vectors was more mixed, but still outperformed classical compositional models.
Problem

Research questions and friction points this paper is trying to address.

Addressing compositional generalization in vision-language models
Training variational quantum circuits for image captioning tasks
Evaluating quantum models against classical compositional approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Variational Quantum Circuits for learning
Hilbert space representations training
MHE and CLIP image encodings
H
Hala Hawashin
Computer Science, University College London, London, UK
M
Mina Abbaszadeh
Computer Science, University College London, London, UK
N
Nicholas Joseph
Computer Science, University College London, London, UK
B
Beth Pearson
School of Eng. Maths. & Tech, University of Bristol, Bristol, UK
Martha Lewis
Martha Lewis
University of Bristol
Artifical IntelligenceCognitive ScienceConceptual SpacesQuantum Theory
Mehrnoosh Sadrzadeh
Mehrnoosh Sadrzadeh
Professor of Computer Science, Royal Academy of Engineering Research Chair,University College London
LogicCategorial GrammarsCompositional Distributional SemanticsMachine Learning