PAL-Net: A Point-Wise CNN with Patch-Attention for 3D Facial Landmark Localization

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Manual annotation of anatomical landmarks on 3D facial scans is time-consuming, expert-dependent, and hinders clinical deployment. To address this, we propose a fully automated deep learning framework for high-precision localization of 50 key anatomical landmarks. Our method integrates coarse alignment, ROI selection, and a novel Patch-Attention PointCNN—eliminating complex input representations and enabling end-to-end training. Crucially, we introduce patch-level attention into point cloud CNNs for the first time, significantly enhancing local geometric modeling and structural consistency. Evaluated on 214 healthy adult scans, the framework achieves a mean landmark error of 3.686 mm; on the FaceScape dataset, it yields point-wise and distance errors of 0.41 mm and 0.38 mm, respectively—matching inter-rater human reproducibility. It demonstrates strong cross-dataset and regional generalization, offering a reliable, clinically deployable solution for craniofacial analysis.

Technology Category

Application Category

📝 Abstract

Manual annotation of anatomical landmarks on 3D facial scans is a time-consuming and expertise-dependent task, yet it remains critical for clinical assessments, morphometric analysis, and craniofacial research. While several deep learning methods have been proposed for facial landmark localization, most focus on pseudo-landmarks or require complex input representations, limiting their clinical applicability. This study presents a fully automated deep learning pipeline (PAL-Net) for localizing 50 anatomical landmarks on stereo-photogrammetry facial models. The method combines coarse alignment, region-of-interest filtering, and an initial approximation of landmarks with a patch-based pointwise CNN enhanced by attention mechanisms. Trained and evaluated on 214 annotated scans from healthy adults, PAL-Net achieved a mean localization error of 3.686 mm and preserves relevant anatomical distances with a 2.822 mm average error, comparable to intra-observer variability. To assess generalization, the model was further evaluated on 700 subjects from the FaceScape dataset, achieving a point-wise error of 0.41,mm and a distance-wise error of 0.38,mm. Compared to existing methods, PAL-Net offers a favorable trade-off between accuracy and computational cost. While performance degrades in regions with poor mesh quality (e.g., ears, hairline), the method demonstrates consistent accuracy across most anatomical regions. PAL-Net generalizes effectively across datasets and facial regions, outperforming existing methods in both point-wise and structural evaluations. It provides a lightweight, scalable solution for high-throughput 3D anthropometric analysis, with potential to support clinical workflows and reduce reliance on manual annotation. Source code can be found at https://github.com/Ali5hadman/PAL-Net-A-Point-Wise-CNN-with-Patch-Attention

Problem

Research questions and friction points this paper is trying to address.

Automating anatomical landmark localization on 3D facial scans

Reducing manual annotation dependency for clinical assessments

Improving accuracy and efficiency in craniofacial research analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Point-wise CNN with patch-attention for 3D landmarks

Combines coarse alignment with ROI filtering

Lightweight scalable solution for clinical workflows

🔎 Similar Papers

No similar papers found.