🤖 AI Summary
Existing facial landmark detection methods struggle with modeling geometric structures and lack robustness under challenging conditions such as large head poses, extreme illumination variations, and diverse facial expressions. To address these limitations, this work proposes a frequency-guided task-balanced Transformer framework. The approach introduces a Frequency-Guided Structure-Aware (FGSA) module to incorporate structural priors from the frequency domain and designs a Fine-grained Multi-task Balanced loss (FMB-loss) that enables landmark-level adaptive weighting to mitigate gradient conflicts among multiple tasks. Combined with a unified cross-dataset training strategy, the proposed method achieves state-of-the-art performance on several mainstream benchmarks, significantly improving both accuracy and robustness in complex real-world scenarios.