🤖 AI Summary
To address the parameter explosion problem in modeling high-order feature interactions in multimodal learning, this paper proposes the Quantum Fusion Layer (QFL), a hybrid quantum-classical differentiable fusion mechanism. QFL employs parameterized quantum circuits to encode cross-modal entanglement relationships—marking the first integration of quantum signal processing into multimodal fusion. We theoretically establish its quantum query advantage over low-rank tensor methods, enabling high-order interaction modeling with linear parameter growth. By jointly training quantum state encoding and variational quantum algorithms, QFL supports end-to-end optimization. Experiments demonstrate that QFL significantly outperforms classical baselines on few-shot multimodal tasks, particularly exhibiting strong generalization under high modality counts. These results validate both the scalability and effectiveness of quantum-enhanced multimodal fusion.
📝 Abstract
The aim of this paper is to introduce a quantum fusion mechanism for multimodal learning and to establish its theoretical and empirical potential. The proposed method, called the Quantum Fusion Layer (QFL), replaces classical fusion schemes with a hybrid quantum-classical procedure that uses parameterized quantum circuits to learn entangled feature interactions without requiring exponential parameter growth. Supported by quantum signal processing principles, the quantum component efficiently represents high-order polynomial interactions across modalities with linear parameter scaling, and we provide a separation example between QFL and low-rank tensor-based methods that highlights potential quantum query advantages. In simulation, QFL consistently outperforms strong classical baselines on small but diverse multimodal tasks, with particularly marked improvements in high-modality regimes. These results suggest that QFL offers a fundamentally new and scalable approach to multimodal fusion that merits deeper exploration on larger systems.