Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing concept erasure methods primarily target diffusion models and exhibit poor adaptability to vision autoregressive (VAR) models—due to instability arising from VARs’ multi-scale token-by-token prediction architecture. This work introduces VARE, the first dedicated concept erasure framework for VAR models, along with its lightweight variant S-VARE. To enhance training stability, VARE incorporates auxiliary visual tokens; it employs a filtering cross-entropy loss to suppress spurious concept activations, and couples a semantic fidelity loss to preserve generation quality. Extensive experiments across multiple benchmarks demonstrate that VARE achieves precise, minimally disruptive removal of unsafe visual concepts—significantly improving VAR model safety without compromising image fidelity or diversity. Our approach effectively bridges a critical gap in content safety for autoregressive generative models.

Technology Category

Application Category

📝 Abstract
The rapid progress of visual autoregressive (VAR) models has brought new opportunities for text-to-image generation, but also heightened safety concerns. Existing concept erasure techniques, primarily designed for diffusion models, fail to generalize to VARs due to their next-scale token prediction paradigm. In this paper, we first propose a novel VAR Erasure framework VARE that enables stable concept erasure in VAR models by leveraging auxiliary visual tokens to reduce fine-tuning intensity. Building upon this, we introduce S-VARE, a novel and effective concept erasure method designed for VAR, which incorporates a filtered cross entropy loss to precisely identify and minimally adjust unsafe visual tokens, along with a preservation loss to maintain semantic fidelity, addressing the issues such as language drift and reduced diversity introduce by naïve fine-tuning. Extensive experiments demonstrate that our approach achieves surgical concept erasure while preserving generation quality, thereby closing the safety gap in autoregressive text-to-image generation by earlier methods.
Problem

Research questions and friction points this paper is trying to address.

Addressing safety gaps in autoregressive text-to-image generation models
Developing concept erasure techniques specifically for visual autoregressive models
Preventing language drift while maintaining generation quality during erasure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages auxiliary visual tokens for stable erasure
Uses filtered cross entropy loss for precise token adjustment
Incorporates preservation loss to maintain semantic fidelity
🔎 Similar Papers
No similar papers found.
Xinhao Zhong
Xinhao Zhong
Harbin Institute of Technology, Shenzhen
Data-centric AIEffiecient AI
Y
Yimin Zhou
Tsinghua Shenzhen International Graduate School, Tsinghua University
Z
Zhiqi Zhang
Jilin University
Junhao Li
Junhao Li
Assistant Project Scientist, Cognitive Science, University of California, San Diego
Non-coding RNAsDNA methylationEpigeneticsBioinformatics
Y
Yi Sun
Harbin Institute of Technology, Shenzhen
B
Bin Chen
Harbin Institute of Technology, Shenzhen; Peng Cheng Laboratory
Shu-Tao Xia
Shu-Tao Xia
SIGS, Tsinghua University
coding and information theorymachine learningcomputer visionAI security
K
Ke Xu
Department of Computer Science and Technology, Tsinghua University