๐ค AI Summary
Large language models (LLMs) have drastically lowered the barrier to generating realistic fake LinkedIn profiles, severely undermining the robustness of existing text-based detectors. This paper presents the first systematic evaluation of mainstream detectors under LLM-generated samples, revealing critical failure modesโspecifically, false acceptance rates (FAR) as high as 42โ52%. To address this, we propose a GPT-assisted adversarial training framework: it leverages GPT to synthesize high-quality adversarial examples and jointly models numerical features with multimodal text embeddings (e.g., BERT and Sentence-BERT). Ablation studies validate the efficacy of our feature fusion strategy. Experiments demonstrate that our approach reduces FAR to 1โ7% while maintaining low false rejection rates (0.5โ2%), significantly improving generalization and adversarial robustness. The framework establishes a scalable, interpretable paradigm for fake identity detection in the LLM era.
๐ Abstract
Large Language Models (LLMs) have made it easier to create realistic fake profiles on platforms like LinkedIn. This poses a significant risk for text-based fake profile detectors. In this study, we evaluate the robustness of existing detectors against LLM-generated profiles. While highly effective in detecting manually created fake profiles (False Accept Rate: 6-7%), the existing detectors fail to identify GPT-generated profiles (False Accept Rate: 42-52%). We propose GPT-assisted adversarial training as a countermeasure, restoring the False Accept Rate to between 1-7% without impacting the False Reject Rates (0.5-2%). Ablation studies revealed that detectors trained on combined numerical and textual embeddings exhibit the highest robustness, followed by those using numerical-only embeddings, and lastly those using textual-only embeddings. Complementary analysis on the ability of prompt-based GPT-4Turbo and human evaluators affirms the need for robust automated detectors such as the one proposed in this study.