Heatmap Regression without Soft-Argmax for Facial Landmark Detection

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Heatmap-based facial landmark detection suffers from slow convergence and approximation errors due to overreliance on the non-differentiable Soft-argmax operator for coordinate decoding. Method: This paper proposes a novel end-to-end training paradigm that eliminates Soft-argmax entirely. It introduces a differentiable, structured-prediction-inspired loss function that directly optimizes global consistency between heatmaps and ground-truth landmark coordinates, bypassing the non-differentiable decoding bottleneck. Contribution/Results: The method achieves state-of-the-art performance on WFLW, COFW, and 300W benchmarks, with mean normalized error (NME) matching or surpassing leading approaches. Training convergence accelerates by 2.2×, significantly reducing computational overhead. Crucially, this work provides the first theoretical and empirical validation that Soft-argmax is not essential for heatmap regression—establishing a simpler, more efficient, and robust training framework for facial landmark detection.

Technology Category

Application Category

📝 Abstract

Facial landmark detection is an important task in computer vision with numerous applications, such as head pose estimation, expression analysis, face swapping, etc. Heatmap regression-based methods have been widely used to achieve state-of-the-art results in this task. These methods involve computing the argmax over the heatmaps to predict a landmark. Since argmax is not differentiable, these methods use a differentiable approximation, Soft-argmax, to enable end-to-end training on deep-nets. In this work, we revisit this long-standing choice of using Soft-argmax and demonstrate that it is not the only way to achieve strong performance. Instead, we propose an alternative training objective based on the classic structured prediction framework. Empirically, our method achieves state-of-the-art performance on three facial landmark benchmarks (WFLW, COFW, and 300W), converging 2.2x faster during training while maintaining better/competitive accuracy. Our code is available here: https://github.com/ca-joe-yang/regression-without-softarg.

Problem

Research questions and friction points this paper is trying to address.

Replacing soft-argmax in facial landmark heatmap regression

Proposing structured prediction for landmark detection training

Achieving faster convergence without soft-argmax approximation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Replacing soft-argmax with structured prediction

Achieving faster convergence without soft-argmax

Maintaining competitive accuracy on landmark benchmarks

🔎 Similar Papers

No similar papers found.