Model Immunization from a Condition Number Perspective

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Model immunity—training models resistant to harmful fine-tuning (e.g., backdoor attacks) while preserving high utility on benign tasks—lacks rigorous theoretical foundations and precise formal definitions. Method: This paper establishes the first theoretical framework for model immunity grounded in the condition number of the Hessian matrix. We propose a condition-number-based formal definition of immunity and derive a feasibility criterion; further, we design a novel regularization-based pretraining algorithm that explicitly controls the Hessian condition number. Our approach integrates second-order optimization modeling, Hessian-aware structured regularization, and linear-theoretic analysis, and is empirically validated on deep neural networks. Contribution/Results: The proposed method significantly enhances robustness against poisoned fine-tuning, achieving strong immunity without compromising original task performance—demonstrating strict preservation of accuracy on clean benchmarks. This work unifies theoretical analysis and practical algorithm design for model immunity.

Technology Category

Application Category

📝 Abstract

Model immunization aims to pre-train models that are difficult to fine-tune on harmful tasks while retaining their utility on other non-harmful tasks. Though prior work has shown empirical evidence for immunizing text-to-image models, the key understanding of when immunization is possible and a precise definition of an immunized model remain unclear. In this work, we propose a framework, based on the condition number of a Hessian matrix, to analyze model immunization for linear models. Building on this framework, we design an algorithm with regularization terms to control the resulting condition numbers after pre-training. Empirical results on linear models and non-linear deep-nets demonstrate the effectiveness of the proposed algorithm on model immunization. The code is available at https://github.com/amberyzheng/model-immunization-cond-num.

Problem

Research questions and friction points this paper is trying to address.

Defines model immunization for harmful task resistance

Proposes Hessian condition number framework for analysis

Develops regularization algorithm for effective immunization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Hessian matrix condition number

Regularization controls condition numbers

Effective for linear and non-linear models

🔎 Similar Papers

tcrLM: a lightweight protein language model for predicting T cell receptor and epitope binding specificity