🤖 AI Summary
This work addresses the challenge of face morphing attack detection in high-security scenarios such as border control by proposing a parameter-efficient differential detection framework based on vision foundation models. For the first time, vision foundation models are integrated into the differential morphing attack detection paradigm, where decisions are made by analyzing embedding discrepancies between suspect images and bona fide reference captures. The approach combines lightweight fine-tuning with class-balanced optimization to preserve the model’s prior knowledge while substantially enhancing detection performance. Evaluated on the standard D-MAD benchmark, the method significantly outperforms existing techniques, reducing the error rate in high-security settings from 6.16% to 2.17%.
📝 Abstract
In this work, we introduce DifFoundMAD, a parameter-efficient D-MAD framework that exploits the generalisation capabilities of vision foundation models (FM) to capture discrepancies between suspected morphs and live capture images. In contrast to conventional D-MAD systems that rely on face recognition embeddings or handcrafted feature differences, DifFoundMAD follows the standard differential paradigm while replacing the underlying representation space with embeddings extracted from FMs. By combining lightweight finetuning with class-balanced optimisation, the proposed method updates only a small subset of parameters while preserving the rich representational priors of the underlying FMs. Extensive cross-database evaluations on standard D-MAD benchmarks demonstrate that DifFoundMAD achieves consistent improvements over state-of-the-art systems, particularly at the strict security levels required in operational deployments such as border control: The error rates reported in the current state-of-the-art were reduced from 6.16% to 2.17% for high-security levels using DifFoundMAD.