🤖 AI Summary
To address the challenges of detecting AI-generated high-fidelity counterfeit identity documents and the scarcity of real sensitive data in remote identity verification, this paper proposes a privacy-preserving forgery detection paradigm. Methodologically, we (1) introduce FakeIDet2-db—the first large-scale, publicly available benchmark comprising over 900,000 image patches of genuine and synthetic forged identity documents, covering printing, screen-display, composite physical attacks, and state-of-the-art AI-generated forgeries; (2) design a patch-based training framework that avoids using complete original documents, thereby preserving data privacy; and (3) develop a lightweight, efficient deep learning detector optimized for multi-condition acquisition on smartphones. Evaluated on a unified, standardized benchmark, our approach achieves significant improvements in detection accuracy, robustness against physical tampering and AI-generated forgeries, and cross-scenario generalization capability.
📝 Abstract
Remote user verification in Internet-based applications is becoming increasingly important nowadays. A popular scenario for it consists of submitting a picture of the user's Identity Document (ID) to a service platform, authenticating its veracity, and then granting access to the requested digital service. An ID is well-suited to verify the identity of an individual, since it is government issued, unique, and nontransferable. However, with recent advances in Artificial Intelligence (AI), attackers can surpass security measures in IDs and create very realistic physical and synthetic fake IDs. Researchers are now trying to develop methods to detect an ever-growing number of these AI-based fakes that are almost indistinguishable from authentic (bona fide) IDs. In this counterattack effort, researchers are faced with an important challenge: the difficulty in using real data to train fake ID detectors. This real data scarcity for research and development is originated by the sensitive nature of these documents, which are usually kept private by the ID owners (the users) and the ID Holders (e.g., government, police, bank, etc.). The main contributions of our study are: 1) We propose and discuss a patch-based methodology to preserve privacy in fake ID detection research. 2) We provide a new public database, FakeIDet2-db, comprising over 900K real/fake ID patches extracted from 2,000 ID images, acquired using different smartphone sensors, illumination and height conditions, etc. In addition, three physical attacks are considered: print, screen, and composite. 3) We present a new privacy-aware fake ID detection method, FakeIDet2. 4) We release a standard reproducible benchmark that considers physical and synthetic attacks from popular databases in the literature.