๐ค AI Summary
Existing monocular 6DoF head pose estimation methods suffer from limited robustness and poor cross-dataset generalization, particularly due to inconsistent definitions of the head centerโleading to long-overlooked biases in translation evaluation. To address this, we propose TRG, the first architecture establishing an explicit bidirectional geometric interaction between facial geometry and head translation. TRG introduces two key innovations: (1) bounding-box correction parameter estimation to refine head localization, and (2) keypoint-based image alignment to enforce geometric consistency. These components jointly integrate geometric constraints with feature-level bidirectional interaction. Extensive experiments on ARKitFace and BIWI demonstrate that TRG significantly outperforms state-of-the-art methods, achieving superior 3D spatial localization accuracy. The source code is publicly available.
๐ Abstract
This study addresses the nuanced challenge of estimating head translations within the context of six-degrees-of-freedom (6DoF) head pose estimation, placing emphasis on this aspect over the more commonly studied head rotations. Identifying a gap in existing methodologies, we recognized the underutilized potential synergy between facial geometry and head translation. To bridge this gap, we propose a novel approach called the head Translation, Rotation, and face Geometry network (TRG), which stands out for its explicit bidirectional interaction structure. This structure has been carefully designed to leverage the complementary relationship between face geometry and head translation, marking a significant advancement in the field of head pose estimation. Our contributions also include the development of a strategy for estimating bounding box correction parameters and a technique for aligning landmarks to image. Both of these innovations demonstrate superior performance in 6DoF head pose estimation tasks. Extensive experiments conducted on ARKitFace and BIWI datasets confirm that the proposed method outperforms current state-of-the-art techniques. Codes are released at https://github.com/asw91666/TRG-Release.