Temperature Scaling Attack Disrupting Model Confidence in Federated Learning

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work identifies a previously overlooked threat in federated learning: the integrity of prediction confidence calibration, which is critical for high-stakes decision-making yet often ignored by existing attacks that primarily target model accuracy or implant backdoors. The paper introduces Temperature Scaling Attack (TSA), the first method to explicitly compromise calibration fidelity as a novel attack surface. TSA dynamically couples temperature scaling with the local learning rate during client training, significantly degrading calibration—increasing expected calibration error by up to 145%—while preserving model accuracy (within a 2% drop). Notably, under non-IID data distributions, TSA maintains malicious updates indistinguishable from benign client behavior, thereby evading mainstream defenses. In safety-critical applications such as healthcare and autonomous driving, TSA amplifies critical false negatives or false positives by up to 7.2× and remains effective against robust aggregation rules and post-hoc calibration techniques.

Technology Category

Application Category

📝 Abstract

Predictive confidence serves as a foundational control signal in mission-critical systems, directly governing risk-aware logic such as escalation, abstention, and conservative fallback. While prior federated learning attacks predominantly target accuracy or implant backdoors, we identify confidence calibration as a distinct attack objective. We present the Temperature Scaling Attack (TSA), a training-time attack that degrades calibration while preserving accuracy. By injecting temperature scaling with learning rate-temperature coupling during local training, malicious updates maintain benign-like optimization behavior, evading accuracy-based monitoring and similarity-based detection. We provide a convergence analysis under non-IID settings, showing that this coupling preserves standard convergence bounds while systematically distorting confidence. Across three benchmarks, TSA substantially shifts calibration (e.g., 145% error increase on CIFAR-100) with<2 accuracy change, and remains effective under robust aggregation and post-hoc calibration defenses. Case studies further show that confidence manipulation can cause up to 7.2x increases in missed critical cases (healthcare) or false alarms (autonomous driving), even when accuracy is unchanged. Overall, our results establish calibration integrity as a critical attack surface in federated learning.

Problem

Research questions and friction points this paper is trying to address.

confidence calibration

federated learning

adversarial attack

model reliability

predictive uncertainty

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temperature Scaling Attack

Confidence Calibration

Federated Learning