🤖 AI Summary
Existing zero-shot hashing methods rely on global, image-level semantic attribute alignment, neglecting correspondences between local regions and fine-grained part-level attributes—leading to noise interference and inaccurate alignment. To address this, we propose the first part-level zero-shot hashing framework explicitly designed for pixel-level semantic reconstruction. Our approach first localizes discriminative local regions via image patch clustering, then establishes a part-level attribute alignment mechanism. Further, we introduce a differentiable attribute vector replacement and reconstruction optimization module to enable end-to-end hash learning. Evaluated on multiple standard zero-shot hashing benchmarks, our method consistently outperforms state-of-the-art approaches, achieving up to a 12.6% improvement in mean Average Precision (mAP). These results empirically validate that part-level semantic alignment is critical for enhancing cross-category generalization in zero-shot hashing.
📝 Abstract
Hashing algorithms have been widely used in large-scale image retrieval tasks, especially for seen class data. Zero-shot hashing algorithms have been proposed to handle unseen class data. The key technique in these algorithms involves learning features from seen classes and transferring them to unseen classes, that is, aligning the feature embeddings between the seen and unseen classes. Most existing zero-shot hashing algorithms use the shared attributes between the two classes of interest to complete alignment tasks. However, the attributes are always described for a whole image, even though they represent specific parts of the image. Hence, these methods ignore the importance of aligning attributes with the corresponding image parts, which explicitly introduces noise and reduces the accuracy achieved when aligning the features of seen and unseen classes. To address this problem, we propose a new zero-shot hashing method called RAZH. We first use a clustering algorithm to group similar patches to image parts for attribute matching and then replace the image parts with the corresponding attribute vectors, gradually aligning each part with its nearest attribute. Extensive evaluation results demonstrate the superiority of the RAZH method over several state-of-the-art methods.