🤖 AI Summary
This work addresses the challenge of efficiently compiling large-scale natural language text into parameterized quantum circuits (PQCs). We propose a quantum-aware modeling framework grounded in pregroup grammar and symmetric monoidal category theory, enabling end-to-end compilation of texts up to 6,410 tokens into interpretable, tree-structured Discocirc quantum circuits—the first such scalable and semantically transparent quantum NLP representation for long texts. Our method rigorously unifies linguistic compositionality with quantum operation isomorphism, leveraging the Lambeq Gen II toolkit to generate PQCs comprising thousands of quantum gates; the implementation is open-sourced and integrated. Empirical evaluation demonstrates strong performance on downstream tasks including text classification and natural language inference, validating both expressivity and practical utility. This work establishes a new paradigm for quantum natural language processing that bridges theoretical rigor—rooted in categorical quantum mechanics—with engineering feasibility and scalability.
📝 Abstract
Quantum approaches to natural language processing (NLP) are redefining how linguistic information is represented and processed. While traditional hybrid quantum-classical models rely heavily on classical neural networks, recent advancements propose a novel framework, DisCoCirc, capable of directly encoding entire documents as parameterised quantum circuits (PQCs), besides enjoying some additional interpretability and compositionality benefits. Following these ideas, this paper introduces an efficient methodology for converting large-scale texts into quantum circuits using tree-like representations of pregroup diagrams. Exploiting the compositional parallels between language and quantum mechanics, grounded in symmetric monoidal categories, our approach enables faithful and efficient encoding of syntactic and discourse relationships in long and complex texts (up to 6410 words in our experiments) to quantum circuits. The developed system is provided to the community as part of the augmented open-source quantum NLP package lambeq Gen II.