🤖 AI Summary
Model immunity—training models resistant to harmful fine-tuning (e.g., backdoor attacks) while preserving high utility on benign tasks—lacks rigorous theoretical foundations and precise formal definitions.
Method: This paper establishes the first theoretical framework for model immunity grounded in the condition number of the Hessian matrix. We propose a condition-number-based formal definition of immunity and derive a feasibility criterion; further, we design a novel regularization-based pretraining algorithm that explicitly controls the Hessian condition number. Our approach integrates second-order optimization modeling, Hessian-aware structured regularization, and linear-theoretic analysis, and is empirically validated on deep neural networks.
Contribution/Results: The proposed method significantly enhances robustness against poisoned fine-tuning, achieving strong immunity without compromising original task performance—demonstrating strict preservation of accuracy on clean benchmarks. This work unifies theoretical analysis and practical algorithm design for model immunity.
📝 Abstract
Model immunization aims to pre-train models that are difficult to fine-tune on harmful tasks while retaining their utility on other non-harmful tasks. Though prior work has shown empirical evidence for immunizing text-to-image models, the key understanding of when immunization is possible and a precise definition of an immunized model remain unclear. In this work, we propose a framework, based on the condition number of a Hessian matrix, to analyze model immunization for linear models. Building on this framework, we design an algorithm with regularization terms to control the resulting condition numbers after pre-training. Empirical results on linear models and non-linear deep-nets demonstrate the effectiveness of the proposed algorithm on model immunization. The code is available at https://github.com/amberyzheng/model-immunization-cond-num.