Zero-Shot Hashing Based on Reconstruction With Part Alignment

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing zero-shot hashing methods rely on global, image-level semantic attribute alignment, neglecting correspondences between local regions and fine-grained part-level attributes—leading to noise interference and inaccurate alignment. To address this, we propose the first part-level zero-shot hashing framework explicitly designed for pixel-level semantic reconstruction. Our approach first localizes discriminative local regions via image patch clustering, then establishes a part-level attribute alignment mechanism. Further, we introduce a differentiable attribute vector replacement and reconstruction optimization module to enable end-to-end hash learning. Evaluated on multiple standard zero-shot hashing benchmarks, our method consistently outperforms state-of-the-art approaches, achieving up to a 12.6% improvement in mean Average Precision (mAP). These results empirically validate that part-level semantic alignment is critical for enhancing cross-category generalization in zero-shot hashing.

Technology Category

Application Category

📝 Abstract
Hashing algorithms have been widely used in large-scale image retrieval tasks, especially for seen class data. Zero-shot hashing algorithms have been proposed to handle unseen class data. The key technique in these algorithms involves learning features from seen classes and transferring them to unseen classes, that is, aligning the feature embeddings between the seen and unseen classes. Most existing zero-shot hashing algorithms use the shared attributes between the two classes of interest to complete alignment tasks. However, the attributes are always described for a whole image, even though they represent specific parts of the image. Hence, these methods ignore the importance of aligning attributes with the corresponding image parts, which explicitly introduces noise and reduces the accuracy achieved when aligning the features of seen and unseen classes. To address this problem, we propose a new zero-shot hashing method called RAZH. We first use a clustering algorithm to group similar patches to image parts for attribute matching and then replace the image parts with the corresponding attribute vectors, gradually aligning each part with its nearest attribute. Extensive evaluation results demonstrate the superiority of the RAZH method over several state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Aligns image parts with attributes for zero-shot hashing.
Improves feature alignment between seen and unseen classes.
Reduces noise in attribute-based image retrieval tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Clustering algorithm groups patches for attribute matching.
Image parts replaced with corresponding attribute vectors.
Gradual alignment of parts with nearest attributes.
🔎 Similar Papers
No similar papers found.
Y
Yan Jiang
Faculty of Electrical Engineering and Computer Science, and Merchants’ Guild Economics and Cultural Intelligent Computing Laboratory, Ningbo University, Ningbo 315211, China
Z
Zhongmiao Qi
Faculty of Electrical Engineering and Computer Science, and Merchants’ Guild Economics and Cultural Intelligent Computing Laboratory, Ningbo University, Ningbo 315211, China
J
Jianhao Li
Faculty of Electrical Engineering and Computer Science, and Merchants’ Guild Economics and Cultural Intelligent Computing Laboratory, Ningbo University, Ningbo 315211, China
J
Jiangbo Qian
Faculty of Electrical Engineering and Computer Science, and Merchants’ Guild Economics and Cultural Intelligent Computing Laboratory, Ningbo University, Ningbo 315211, China
C
Chong Wang
Faculty of Electrical Engineering and Computer Science, and Merchants’ Guild Economics and Cultural Intelligent Computing Laboratory, Ningbo University, Ningbo 315211, China
Yu Xin
Yu Xin
University of Florida
Computer VisionVision-language Model