🤖 AI Summary
Unscrupulous institutions exploit machine learning models’ “abstention” mechanism—designed to withhold predictions under uncertainty—to implement discriminatory service denials. This paper identifies and formally defines the “Mirage Attack”: an adversarial strategy wherein attackers artificially suppress model confidence scores to feign epistemic uncertainty and thereby mask malicious abstention.
Method: We propose the first verifiable confidence assurance framework integrating statistical calibration analysis with zero-knowledge proofs, enabling cryptographically auditable and trustworthy inference verification.
Contribution/Results: Our framework robustly detects and prevents confidence-score tampering while preserving high predictive accuracy. It guarantees that abstention is triggered *only* by genuine model uncertainty—not by adversarial manipulation—thereby establishing a theoretically sound and practically deployable foundation for fair, transparent, and auditable abstention mechanisms.
📝 Abstract
Cautious predictions -- where a machine learning model abstains when uncertain -- are crucial for limiting harmful errors in safety-critical applications. In this work, we identify a novel threat: a dishonest institution can exploit these mechanisms to discriminate or unjustly deny services under the guise of uncertainty. We demonstrate the practicality of this threat by introducing an uncertainty-inducing attack called Mirage, which deliberately reduces confidence in targeted input regions, thereby covertly disadvantaging specific individuals. At the same time, Mirage maintains high predictive performance across all data points. To counter this threat, we propose Confidential Guardian, a framework that analyzes calibration metrics on a reference dataset to detect artificially suppressed confidence. Additionally, it employs zero-knowledge proofs of verified inference to ensure that reported confidence scores genuinely originate from the deployed model. This prevents the provider from fabricating arbitrary model confidence values while protecting the model's proprietary details. Our results confirm that Confidential Guardian effectively prevents the misuse of cautious predictions, providing verifiable assurances that abstention reflects genuine model uncertainty rather than malicious intent.