FantasyID: A dataset for detecting digital manipulations of ID-documents

📅 2025-07-28

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Generative AI has intensified threats from digital identity document (ID) forgery, yet existing benchmarks lack realism and legal safety for KYC scenarios. Method: We introduce ID-Forgery, the first public benchmark dataset for ID forgery detection tailored to KYC. It innovatively combines real human faces with multilingual, multi-style ID templates; simulates realistic acquisition distortions via physical printing followed by multi-device re-capture; and applies digital attacks—including content injection and localized tampering—using mainstream generative models. All samples exclude synthetic faces and watermarks, ensuring commercial usability and zero legal risk. Contribution/Results: Extensive evaluation reveals severe limitations of state-of-the-art detectors: TruFor, MMFusion, UniFD, and FatFormer achieve a 48.7% false-negative rate under a 10% false-positive constraint. ID-Forgery thus establishes a more realistic, challenging, and practically meaningful benchmark for advancing ID forgery detection research.

Technology Category

Application Category

📝 Abstract

Advancements in image generation led to the availability of easy-to-use tools for malicious actors to create forged images. These tools pose a serious threat to the widespread Know Your Customer (KYC) applications, requiring robust systems for detection of the forged Identity Documents (IDs). To facilitate the development of the detection algorithms, in this paper, we propose a novel publicly available (including commercial use) dataset, FantasyID, which mimics real-world IDs but without tampering with legal documents and, compared to previous public datasets, it does not contain generated faces or specimen watermarks. FantasyID contains ID cards with diverse design styles, languages, and faces of real people. To simulate a realistic KYC scenario, the cards from FantasyID were printed and captured with three different devices, constituting the bonafide class. We have emulated digital forgery/injection attacks that could be performed by a malicious actor to tamper the IDs using the existing generative tools. The current state-of-the-art forgery detection algorithms, such as TruFor, MMFusion, UniFD, and FatFormer, are challenged by FantasyID dataset. It especially evident, in the evaluation conditions close to practical, with the operational threshold set on validation set so that false positive rate is at 10%, leading to false negative rates close to 50% across the board on the test set. The evaluation experiments demonstrate that FantasyID dataset is complex enough to be used as an evaluation benchmark for detection algorithms.

Problem

Research questions and friction points this paper is trying to address.

Detecting forged ID documents in KYC applications

Addressing limitations of existing public ID datasets

Evaluating robustness of forgery detection algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Public dataset mimics real-world ID documents

Diverse designs, languages, and real faces included

Printed and captured to simulate KYC scenarios

🔎 Similar Papers

Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method