Through the PRISm: Importance-Aware Scene Graphs for Image Retrieval

📅 2025-12-20

📈 Citations: 0

✨ Influential: 0

career value

150K/year

🤖 AI Summary

Semantic similarity-based image retrieval faces challenges in modeling scene-level relationships and contextual dependencies. To address this, we propose PRISm, a multimodal framework introducing importance-aware scene graph modeling for the first time. First, it incorporates an adaptive scene graph pruning mechanism that dynamically retains semantically critical subgraphs. Second, it designs an Edge-Aware Graph Neural Network (GNN) that jointly models relational topologies and global visual features, enabling human-perception-aligned and interpretable reasoning. Third, it unifies visual and linguistic semantic spaces via multimodal embedding alignment. Extensive experiments on multiple benchmarks and real-world datasets demonstrate that PRISm significantly outperforms state-of-the-art methods in top-K retrieval accuracy. Qualitative analysis confirms its capability to precisely identify key objects and their interactions, yielding semantically coherent and interpretable retrieval results.

Technology Category

Application Category

📝 Abstract

Accurately retrieving images that are semantically similar remains a fundamental challenge in computer vision, as traditional methods often fail to capture the relational and contextual nuances of a scene. We introduce PRISm (Pruning-based Image Retrieval via Importance Prediction on Semantic Graphs), a multimodal framework that advances image-to-image retrieval through two novel components. First, the Importance Prediction Module identifies and retains the most critical objects and relational triplets within an image while pruning irrelevant elements. Second, the Edge-Aware Graph Neural Network explicitly encodes relational structure and integrates global visual features to produce semantically informed image embeddings. PRISm achieves image retrieval that closely aligns with human perception by explicitly modeling the semantic importance of objects and their interactions, capabilities largely absent in prior approaches. Its architecture effectively combines relational reasoning with visual representation, enabling semantically grounded retrieval. Extensive experiments on benchmark and real-world datasets demonstrate consistently superior top-ranked performance, while qualitative analyses show that PRISm accurately captures key objects and interactions, producing interpretable and semantically meaningful results.

Problem

Research questions and friction points this paper is trying to address.

Enhances image retrieval by modeling semantic importance of objects and relations.

Prunes irrelevant elements to retain critical objects and relational triplets.

Integrates relational reasoning with visual features for semantically grounded retrieval.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Importance Prediction Module prunes irrelevant elements

Edge-Aware Graph Neural Network encodes relational structure

Framework combines relational reasoning with visual representation

🔎 Similar Papers

No similar papers found.