🤖 AI Summary
To address the significant performance degradation of 2D face recognition under large pose variations, this paper proposes a pose-invariant 2D–3D cross-modal domain adaptation framework. Methodologically, it introduces (1) a novel shared cross-modal attention mapping mechanism that explicitly models fine-grained correspondences between 2D image and 3D point cloud representations, and (2) a joint entropy regularization loss that simultaneously enforces distributional consistency and discriminability across both modalities. Crucially, the method operates without 3D label supervision, leveraging 3D point clouds as geometric guidance to learn pose-robust 2D representations. Evaluated on FaceScape and ARL-VTF, it achieves absolute improvements of 7.1% and 1.57% in TAR@1%FAR for 90° profile-face recognition, respectively—substantially outperforming current state-of-the-art approaches.
📝 Abstract
Despite recent advances in facial recognition, there remains a fundamental issue concerning degradations in performance due to substantial perspective (pose) differences between enrollment and query (probe) imagery. Therefore, we propose a novel domain adaptive framework to facilitate improved performances across large discrepancies in pose by enabling image-based (2D) representations to infer properties of inherently pose invariant point cloud (3D) representations. Specifically, our proposed framework achieves better pose invariance by using (1) a shared (joint) attention mapping to emphasize common patterns that are most correlated between 2D facial images and 3D facial data and (2) a joint entropy regularizing loss to promote better consistency$unicode{x2014}$enhancing correlations among the intersecting 2D and 3D representations$unicode{x2014}$by leveraging both attention maps. This framework is evaluated on FaceScape and ARL-VTF datasets, where it outperforms competitive methods by achieving profile (90$unicode{x00b0}$$unicode{x002b}$) TAR @ 1$unicode{x0025}$ FAR improvements of at least 7.1$unicode{x0025}$ and 1.57$unicode{x0025}$, respectively.