Compressed Models are NOT Trust-equivalent to Their Large Counterparts

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work investigates whether compressed deep learning models maintain “trust equivalence” with their uncompressed counterparts—i.e., whether explanation consistency and probabilistic calibration remain preserved under comparable accuracy. Focusing on text classification, we propose the first two-dimensional trust equivalence evaluation framework: it quantifies explanation alignment using LIME and SHAP, and assesses calibration similarity via Expected Calibration Error (ECE), Maximum Calibration Error (MCE), Brier Score, and reliability diagrams. Experiments on BERT-base and its compressed variants reveal that, despite near-identical accuracy, substantial discrepancies persist in explanation fidelity and calibration behavior—particularly in natural language inference and paraphrase identification tasks—uncovering a non-negligible trust gap. To our knowledge, this is the first systematic empirical demonstration that model compression degrades core trustworthiness attributes. Our framework establishes a foundational evaluation paradigm for trustworthy AI deployment, enabling rigorous assessment of post-compression model reliability.

Technology Category

Application Category

📝 Abstract

Large Deep Learning models are often compressed before being deployed in a resource-constrained environment. Can we trust the prediction of compressed models just as we trust the prediction of the original large model? Existing work has keenly studied the effect of compression on accuracy and related performance measures. However, performance parity does not guarantee trust-equivalence. We propose a two-dimensional framework for trust-equivalence evaluation. First, interpretability alignment measures whether the models base their predictions on the same input features. We use LIME and SHAP tests to measure the interpretability alignment. Second, calibration similarity measures whether the models exhibit comparable reliability in their predicted probabilities. It is assessed via ECE, MCE, Brier Score, and reliability diagrams. We conducted experiments using BERT-base as the large model and its multiple compressed variants. We focused on two text classification tasks: natural language inference and paraphrase identification. Our results reveal low interpretability alignment and significant mismatch in calibration similarity. It happens even when the accuracies are nearly identical between models. These findings show that compressed models are not trust-equivalent to their large counterparts. Deploying compressed models as a drop-in replacement for large models requires careful assessment, going beyond performance parity.

Problem

Research questions and friction points this paper is trying to address.

Evaluating trust-equivalence between compressed and large models

Assessing interpretability alignment via LIME and SHAP tests

Measuring calibration similarity using ECE, MCE and Brier Score

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-dimensional trust-equivalence evaluation framework

LIME and SHAP tests for interpretability alignment

Calibration metrics including ECE and Brier Score

🔎 Similar Papers

No similar papers found.