AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work exposes a critical security vulnerability in large language models (LLMs): their susceptibility to hidden adversarial instruction attacks in domain-specific applications—particularly resume screening—leading to task misalignment. To address this, we introduce the first adversarial benchmark tailored to resume screening, empirically demonstrating attack success rates exceeding 80%. We propose FIDS, a novel defense framework that integrates LoRA-adapted foreign instruction detection during training, enabling proactive identification and suppression of implicit adversarial instructions—a capability absent in prior approaches. Experimental results show that FIDS alone reduces attack success by 15.4% with only a 10.4% increase in false rejection rate; when combined with complementary strategies, it achieves 26.3% attack suppression. FIDS significantly outperforms existing prompt-engineering-based defenses, establishing a new state of the art in robustness for LLMs in high-stakes, domain-specific deployment scenarios.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) excel at text comprehension and generation, making them ideal for automated tasks like code review and content moderation. However, our research identifies a vulnerability: LLMs can be manipulated by "adversarial instructions" hidden in input data, such as resumes or code, causing them to deviate from their intended task. Notably, while defenses may exist for mature domains such as code review, they are often absent in other common applications such as resume screening and peer review. This paper introduces a benchmark to assess this vulnerability in resume screening, revealing attack success rates exceeding 80% for certain attack types. We evaluate two defense mechanisms: prompt-based defenses achieve 10.1% attack reduction with 12.5% false rejection increase, while our proposed FIDS (Foreign Instruction Detection through Separation) using LoRA adaptation achieves 15.4% attack reduction with 10.4% false rejection increase. The combined approach provides 26.3% attack reduction, demonstrating that training-time defenses outperform inference-time mitigations in both security and utility preservation.

Problem

Research questions and friction points this paper is trying to address.

Examines adversarial vulnerabilities in specialized LLM applications like resume screening

Introduces a benchmark to assess high attack success rates in these domains

Evaluates and compares defense mechanisms to reduce attacks while preserving utility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark assesses adversarial vulnerabilities in resume screening

FIDS uses LoRA adaptation for foreign instruction detection

Combined training-time defenses reduce attacks by 26.3%

🔎 Similar Papers

Large Language Models for Cyber Security: A Systematic Literature Review