Learning Visual-Semantic Subspace Representations for Propositional Reasoning

📅 2024-05-25
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of modeling hierarchical semantic partial-order relations and enabling logical reasoning in vision–semantics joint representation learning. We propose the first nuclear-norm-driven subspace learning framework, wherein the image representation space naturally forms a subspace lattice. Semantic entailment and conjunction—core propositional logic operations—are geometrically realized as subspace inclusion and intersection, respectively. Leveraging nuclear-norm regularization, projection operator modeling, and spectral geometric embedding, we theoretically prove that the optimal solution exactly encodes the semantic spectral geometric structure. Evaluated on visual reasoning benchmarks, our method achieves significant improvements in propositional consistency (+12.3%) and hierarchical reasoning accuracy (+9.7%). The framework provides both rigorous theoretical guarantees and expressive structured semantic representations, bridging geometric deep learning with formal logic semantics.

Technology Category

Application Category

📝 Abstract
Learning representations that capture rich semantic relationships and accommodate propositional calculus poses a significant challenge. Existing approaches are either contrastive, lacking theoretical guarantees, or fall short in effectively representing the partial orders inherent to rich visual-semantic hierarchies. In this paper, we propose a novel approach for learning visual representations that not only conform to a specified semantic structure but also facilitate probabilistic propositional reasoning. Our approach is based on a new nuclear norm-based loss. We show that its minimum encodes the spectral geometry of the semantics in a subspace lattice, where logical propositions can be represented by projection operators.
Problem

Research questions and friction points this paper is trying to address.

Learning semantic image representations with theoretical guarantees
Encoding partial orders in visual-semantic structured data
Associating logical propositions with geometric subspace representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Nuclear norm-based loss function
Encodes spectral geometry in subspace lattice
Associates logical propositions with subspaces
G
Gabriel Moreira
Language Technologies Institute, Institute for Systems and Robotics
A
Alexander Hauptmann
Language Technologies Institute
Manuel Marques
Manuel Marques
Institute for Systems and Robotics (ISR/IST), LARSyS, Instituto Superior Técnico, Univ Lisboa
Computer VisionImage ProcessingAssistive RoboticsUrban Mobility
J
J. Costeira
Institute for Systems and Robotics