GADS: A Super Lightweight Model for Head Pose Estimation

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

To address the accuracy-efficiency trade-off in head pose estimation on edge devices, this paper proposes Grouped Attention Deep Sets (GADS), a lightweight architecture. GADS semantically groups facial landmarks into regional clusters and integrates lightweight Deep Sets layers with grouped multi-head attention to enable efficient cross-group feature fusion. Two inference paradigms are introduced: a vanilla landmark-only variant and a Hybrid-GADS variant that fuses RGB image features. The model achieves state-of-the-art (SOTA) accuracy on AFLW2000, BIWI, and 300W-LP while reducing parameter count to just 1/7.5 of the previous lightest SOTA method, accelerating inference by 25×, and compressing the best-performing model by 4321×. These gains underscore GADS’s effectiveness in balancing computational efficiency and estimation precision for resource-constrained deployment.

Technology Category

Application Category

📝 Abstract

In human-computer interaction, head pose estimation profoundly influences application functionality. Although utilizing facial landmarks is valuable for this purpose, existing landmark-based methods prioritize precision over simplicity and model size, limiting their deployment on edge devices and in compute-poor environments. To bridge this gap, we propose extbf{Grouped Attention Deep Sets (GADS)}, a novel architecture based on the Deep Set framework. By grouping landmarks into regions and employing small Deep Set layers, we reduce computational complexity. Our multihead attention mechanism extracts and combines inter-group information, resulting in a model that is $7.5 imes$ smaller and executes $25 imes$ faster than the current lightest state-of-the-art model. Notably, our method achieves an impressive reduction, being $4321 imes$ smaller than the best-performing model. We introduce vanilla GADS and Hybrid-GADS (landmarks + RGB) and evaluate our models on three benchmark datasets -- AFLW2000, BIWI, and 300W-LP. We envision our architecture as a robust baseline for resource-constrained head pose estimation methods.

Problem

Research questions and friction points this paper is trying to address.

Lightweight head pose estimation for edge devices

Reducing model size and computational complexity

Improving efficiency without sacrificing accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Grouped Attention Deep Sets for efficiency

Small Deep Set layers reduce complexity

Multihead attention combines inter-group information

🔎 Similar Papers

6DoF Head Pose Estimation through Explicit Bidirectional Interaction with Face Geometry