GADS: A Super Lightweight Model for Head Pose Estimation

📅 2025-04-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the accuracy-efficiency trade-off in head pose estimation on edge devices, this paper proposes Grouped Attention Deep Sets (GADS), a lightweight architecture. GADS semantically groups facial landmarks into regional clusters and integrates lightweight Deep Sets layers with grouped multi-head attention to enable efficient cross-group feature fusion. Two inference paradigms are introduced: a vanilla landmark-only variant and a Hybrid-GADS variant that fuses RGB image features. The model achieves state-of-the-art (SOTA) accuracy on AFLW2000, BIWI, and 300W-LP while reducing parameter count to just 1/7.5 of the previous lightest SOTA method, accelerating inference by 25×, and compressing the best-performing model by 4321×. These gains underscore GADS’s effectiveness in balancing computational efficiency and estimation precision for resource-constrained deployment.

Technology Category

Application Category

📝 Abstract
In human-computer interaction, head pose estimation profoundly influences application functionality. Although utilizing facial landmarks is valuable for this purpose, existing landmark-based methods prioritize precision over simplicity and model size, limiting their deployment on edge devices and in compute-poor environments. To bridge this gap, we propose extbf{Grouped Attention Deep Sets (GADS)}, a novel architecture based on the Deep Set framework. By grouping landmarks into regions and employing small Deep Set layers, we reduce computational complexity. Our multihead attention mechanism extracts and combines inter-group information, resulting in a model that is $7.5 imes$ smaller and executes $25 imes$ faster than the current lightest state-of-the-art model. Notably, our method achieves an impressive reduction, being $4321 imes$ smaller than the best-performing model. We introduce vanilla GADS and Hybrid-GADS (landmarks + RGB) and evaluate our models on three benchmark datasets -- AFLW2000, BIWI, and 300W-LP. We envision our architecture as a robust baseline for resource-constrained head pose estimation methods.
Problem

Research questions and friction points this paper is trying to address.

Lightweight head pose estimation for edge devices
Reducing model size and computational complexity
Improving efficiency without sacrificing accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Grouped Attention Deep Sets for efficiency
Small Deep Set layers reduce complexity
Multihead attention combines inter-group information
🔎 Similar Papers
No similar papers found.
Menan Velayuthan
Menan Velayuthan
Research Engineer, University of Moratuwa
Neural Machine TranslationGraph Neural NetworksDeep SetsGeometric Deep learningComputer
Asiri Gawesha
Asiri Gawesha
Temporary Lecturer of Open University of Sri Lanka
Machine learningEdge computingTinyMLModel CompressionEfficient Architectures
Purushoth Velayuthan
Purushoth Velayuthan
AI Engineer
Artificial IntelligenceComputer visionAR/VR/MR technologiesGenAIRemote sensing
N
N. Kodagoda
Faculty of Computing, Sri Lanka Institute of Information Technology, New Kandy Road, Malabe, 10115, Western, Sri Lanka
D
D. Kasthurirathna
Faculty of Computing, Sri Lanka Institute of Information Technology, New Kandy Road, Malabe, 10115, Western, Sri Lanka
P
Pradeepa Samarasinghe
Faculty of Computing, Sri Lanka Institute of Information Technology, New Kandy Road, Malabe, 10115, Western, Sri Lanka