🤖 AI Summary
To address semantic representation inconsistency and weak inter-class discriminability in image classification and retrieval, this paper proposes a graph attention network (GAT)-based autoencoder framework for learning representative embeddings. The method constructs a two-level graph comprising image nodes and category nodes, models fine-grained structural relationships via a similarity-based graph, and employs a GAT-based autoencoder to learn context-aware and class-aware latent embeddings in an end-to-end manner. Its key innovation lies in explicitly incorporating category priors into the graph topology design and jointly optimizing image-level matching and category-level discriminative objectives. Extensive experiments on standard benchmarks demonstrate that the proposed method significantly outperforms conventional handcrafted features and state-of-the-art embedding approaches, achieving new state-of-the-art performance in both classification accuracy and retrieval effectiveness (measured by mean Average Precision, mAP).
📝 Abstract
We propose a method for image categorization and retrieval that leverages graphs and a graph attention network (GAT)-based autoencoder. Our approach is representative-centric, that is, we execute the categorization and retrieval process via the representative models we construct for the images and image categories. We utilize a graph where nodes represent images (or their representatives) and edges capture similarity relationships. GAT highlights important features and relationships between images, enabling the autoencoder to construct context-aware latent representations that capture the key features of each image relative to its neighbors. We obtain category representatives from these embeddings and categorize a query image by comparing its representative to the category representatives. We then retrieve the most similar image to the query image within its identified category. We demonstrate the effectiveness of our representative-centric approach through experiments with both the GAT autoencoders and standard feature-based techniques.