🤖 AI Summary
This work addresses the governance risks posed by unauthorized model merging, which can circumvent safety alignment or licensing restrictions. To counter this, the authors propose Trap², a framework that reshapes the model’s loss landscape during fine-tuning through a scaling-sensitive loss function. This design ensures high performance under legitimate usage while significantly degrading model utility when subjected to illicit merging. Trap² represents the first architecture-agnostic defense mechanism against model merging, embedding protection directly into the training process and supporting both full-model and adapter-based deployment paradigms. Experimental results demonstrate that Trap² effectively suppresses unauthorized merging attempts without compromising model efficacy in compliant scenarios.
📝 Abstract
The rise of model hubs has made it easier to access reusable model components, making model merging a practical tool for combining capabilities. Yet, this modularity also creates a \emph{governance gap}: downstream users can recompose released weights into unauthorized mixtures that bypass safety alignment or licensing terms. Because existing defenses are largely post-hoc and architecture-specific, they provide inconsistent protection across diverse architectures and release formats in practice. To close this gap, we propose \textsc{Trap}$^{2}$, an architecture-agnostic protection framework that encodes protection into the update during fine-tuning, regardless of whether they are released as adapters or full models. Instead of relying on architecture-dependent approaches, \textsc{Trap}$^{2}$ uses weight re-scaling as a simple proxy for the merging process. It keeps released weights effective in standalone use, but degrades them under re-scaling that often arises in merging, undermining unauthorized merging.