Towards Building Non-Fine-Tunable Foundation Models

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This work addresses the security and economic risks posed by unauthorized fine-tuning of open-source foundation models. To mitigate this, the authors propose Private Masked Pretraining (PMP), a framework that restricts representation learning during pretraining to a sparse subnetwork. While the full model weights are released publicly, the binary mask identifying the active subnetwork remains private. This design renders unauthorized fine-tuning ineffective due to parameter space misalignment. PMP introduces the first tunable non-fine-tunability mechanism: by adjusting the sparsity ratio of the mask, one can flexibly control the model’s resistance to unauthorized adaptation without compromising its original performance. Theoretical analysis links gradient stability to the identifiability of the sparse subnetwork, and experiments on large language models demonstrate that PMP preserves foundational capabilities while significantly diminishing the gains from unauthorized fine-tuning across diverse downstream tasks.

Technology Category

Application Category

📝 Abstract

Open-sourcing foundation models (FMs) enables broad reuse but also exposes model trainers to economic and safety risks from unrestricted downstream fine-tuning. We address this problem by building non-fine-tunable foundation models: models that remain broadly usable in their released form while yielding limited adaptation gains under task-agnostic unauthorized fine-tuning. We propose Private Mask Pre-Training (PMP), a pre-training framework that concentrates representation learning into a sparse subnetwork identified early in training. The binary mask defining this subnetwork is kept private, and only the final dense weights are released. This forces unauthorized fine-tuning without access to the mask to update parameters misaligned with pretraining subspace, inducing an intrinsic mismatch between the fine-tuning objective and the pre-training geometry. We provide theoretical analysis showing that this mismatch destabilizes gradient-based adaptation and bounds fine-tuning gains. Empirical results on large language models demonstrating that PMP preserves base model performance while consistently degrading unauthorized fine-tuning across a wide range of downstream tasks, with the strength of non-fine-tunability controlled by the mask ratio.

Problem

Research questions and friction points this paper is trying to address.

foundation models

fine-tuning

model security

unauthorized adaptation

pre-training

Innovation

Methods, ideas, or system contributions that make the work stand out.

non-fine-tunable foundation models

Private Mask Pre-Training

sparse subnetwork