Enhancing Noise Resilience in Face Clustering via Sparse Differential Transformer

📅 2025-12-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing facial clustering methods rely on Jaccard similarity to enhance embedding relationship modeling but suffer from spurious node inclusion, degrading discriminability; moreover, the optimal Top-K neighborhood size is difficult to predict adaptively, and conventional Transformers introduce noise by over-modeling irrelevant features. To address these issues, we propose a prediction-driven Top-K Jaccard similarity framework coupled with a sparse differential Transformer architecture. First, we introduce a novel *prediction-guided Top-K neighborhood selection* mechanism that dynamically refines neighborhood purity based on clustering confidence. Second, we design a *sparse differential Transformer*, which jointly employs differential feature representation and sparse attention to suppress irrelevant responses, thereby significantly improving robustness and noise resilience in similarity estimation. Extensive experiments on MS-Celeb-1M and other benchmarks demonstrate state-of-the-art performance: our method achieves superior clustering accuracy and generalization compared to existing approaches.

Technology Category

Application Category

📝 Abstract
The method used to measure relationships between face embeddings plays a crucial role in determining the performance of face clustering. Existing methods employ the Jaccard similarity coefficient instead of the cosine distance to enhance the measurement accuracy. However, these methods introduce too many irrelevant nodes, producing Jaccard coefficients with limited discriminative power and adversely affecting clustering performance. To address this issue, we propose a prediction-driven Top-K Jaccard similarity coefficient that enhances the purity of neighboring nodes, thereby improving the reliability of similarity measurements. Nevertheless, accurately predicting the optimal number of neighbors (Top-K) remains challenging, leading to suboptimal clustering results. To overcome this limitation, we develop a Transformer-based prediction model that examines the relationships between the central node and its neighboring nodes near the Top-K to further enhance the reliability of similarity estimation. However, vanilla Transformer, when applied to predict relationships between nodes, often introduces noise due to their overemphasis on irrelevant feature relationships. To address these challenges, we propose a Sparse Differential Transformer (SDT), instead of the vanilla Transformer, to eliminate noise and enhance the model's anti-noise capabilities. Extensive experiments on multiple datasets, such as MS-Celeb-1M, demonstrate that our approach achieves state-of-the-art (SOTA) performance, outperforming existing methods and providing a more robust solution for face clustering.
Problem

Research questions and friction points this paper is trying to address.

Improves face clustering by enhancing noise resilience in similarity measurement.
Addresses limited discriminative power of Jaccard coefficients in existing methods.
Proposes Sparse Differential Transformer to reduce noise and boost anti-noise capabilities.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Prediction-driven Top-K Jaccard similarity enhances neighbor purity
Transformer-based model improves reliability of similarity estimation
Sparse Differential Transformer eliminates noise for robust clustering
🔎 Similar Papers
No similar papers found.
Dafeng Zhang
Dafeng Zhang
Samsung Research China – Beijing (SRC-B)
computer visionlow-level visionface
Y
Yongqi Song
Samsung R&D Institute China-Beijing (SRC-B), Beijing, China
S
Shizhuo Liu
Samsung R&D Institute China-Beijing (SRC-B), Beijing, China