🤖 AI Summary
Existing targeted label-flipping attacks against enhanced vertical federated learning (VFL) systems rely on model outputs and ground-truth labels, making them vulnerable to detection by anomaly detectors. Method: We propose a two-stage black-box targeted attack framework requiring only local data and query access to the detector—without knowledge of true labels or model internals. Our approach innovatively employs Maximum Mean Discrepancy (MMD)-based sampling to select highly expressive local samples, constructs a local surrogate model and a detector emulator in the absence of labels or model outputs, and generates stealthy adversarial perturbations via gradient-guided optimization. Contribution/Results: Extensive experiments across four models, seven cross-modal datasets, and two detector types demonstrate that our method significantly outperforms four baselines, evades detection with high success rates, and maintains strong attack efficacy against three mainstream privacy-preserving defenses: differential privacy, secure aggregation, and gradient clipping.
📝 Abstract
Vertical federated learning (VFL) enables multiple parties with disjoint features to collaboratively train models without sharing raw data. While privacy vulnerabilities of VFL are extensively-studied, its security threats-particularly targeted label attacks-remain underexplored. In such attacks, a passive party perturbs inputs at inference to force misclassification into adversary-chosen labels. Existing methods rely on unrealistic assumptions (e.g., accessing VFL-model's outputs) and ignore anomaly detectors deployed in real-world systems. To bridge this gap, we introduce VTarbel, a two-stage, minimal-knowledge attack framework explicitly designed to evade detector-enhanced VFL inference. During the preparation stage, the attacker selects a minimal set of high-expressiveness samples (via maximum mean discrepancy), submits them through VFL protocol to collect predicted labels, and uses these pseudo-labels to train estimated detector and surrogate model on local features. In attack stage, these models guide gradient-based perturbations of remaining samples, crafting adversarial instances that induce targeted misclassifications and evade detection. We implement VTarbel and evaluate it against four model architectures, seven multimodal datasets, and two anomaly detectors. Across all settings, VTarbel outperforms four state-of-the-art baselines, evades detection, and retains effective against three representative privacy-preserving defenses. These results reveal critical security blind spots in current VFL deployments and underscore urgent need for robust, attack-aware defenses.