Learning Visual Hierarchies in Hyperbolic Space for Image Retrieval

📅 2024-11-26

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

To address the absence of semantic hierarchy modeling and explicit hierarchical labels in image retrieval, this paper proposes the first unsupervised hyperbolic hierarchical encoding method, which implicitly captures user-defined multi-level visual structures. Methodologically, we introduce a hyperbolic embedding paradigm grounded in contrastive learning and pairwise entailment scoring, leveraging part-level object annotations to automatically infer hierarchical relationships; we further design novel evaluation metrics tailored for hierarchical retrieval. Our contributions are threefold: (1) the first integration of hyperbolic geometry with entailment reasoning for visual hierarchical representation; (2) learning embeddings that jointly preserve semantic hierarchy and capture visual similarity—without requiring hierarchical supervision; and (3) achieving significant performance gains on part-level image retrieval under hierarchical evaluation protocols.

Technology Category

Application Category

📝 Abstract

Structuring latent representations in a hierarchical manner enables models to learn patterns at multiple levels of abstraction. However, most prevalent image understanding models focus on visual similarity, and learning visual hierarchies is relatively unexplored. In this work, for the first time, we introduce a learning paradigm that can encode user-defined multi-level complex visual hierarchies in hyperbolic space without requiring explicit hierarchical labels. As a concrete example, first, we define a part-based image hierarchy using object-level annotations within and across images. Then, we introduce an approach to enforce the hierarchy using contrastive loss with pairwise entailment metrics. Finally, we discuss new evaluation metrics to effectively measure hierarchical image retrieval. Encoding these complex relationships ensures that the learned representations capture semantic and structural information that transcends mere visual similarity. Experiments in part-based image retrieval show significant improvements in hierarchical retrieval tasks, demonstrating the capability of our model in capturing visual hierarchies.

Problem

Research questions and friction points this paper is trying to address.

Learning visual hierarchies in hyperbolic space for image retrieval

Encoding user-defined multi-level visual hierarchies without explicit labels

Improving hierarchical image retrieval using contrastive loss and entailment metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Encodes visual hierarchies in hyperbolic space

Uses contrastive loss with entailment metrics

Introduces metrics for hierarchical image retrieval

🔎 Similar Papers

No similar papers found.