🤖 AI Summary
This work addresses the challenge of modeling hierarchical semantic partial-order relations and enabling logical reasoning in vision–semantics joint representation learning. We propose the first nuclear-norm-driven subspace learning framework, wherein the image representation space naturally forms a subspace lattice. Semantic entailment and conjunction—core propositional logic operations—are geometrically realized as subspace inclusion and intersection, respectively. Leveraging nuclear-norm regularization, projection operator modeling, and spectral geometric embedding, we theoretically prove that the optimal solution exactly encodes the semantic spectral geometric structure. Evaluated on visual reasoning benchmarks, our method achieves significant improvements in propositional consistency (+12.3%) and hierarchical reasoning accuracy (+9.7%). The framework provides both rigorous theoretical guarantees and expressive structured semantic representations, bridging geometric deep learning with formal logic semantics.
📝 Abstract
Learning representations that capture rich semantic relationships and accommodate propositional calculus poses a significant challenge. Existing approaches are either contrastive, lacking theoretical guarantees, or fall short in effectively representing the partial orders inherent to rich visual-semantic hierarchies. In this paper, we propose a novel approach for learning visual representations that not only conform to a specified semantic structure but also facilitate probabilistic propositional reasoning. Our approach is based on a new nuclear norm-based loss. We show that its minimum encodes the spectral geometry of the semantics in a subspace lattice, where logical propositions can be represented by projection operators.