Problem
Research questions and friction points this paper is trying to address.
Understanding how censorship works in LLMs
Finding refusal-compliance vectors for censorship control
Uncovering thought suppression in reasoning LLMs
Innovation
Methods, ideas, or system contributions that make the work stand out.
Representation engineering for censorship control
Refusal-compliance vector detects output censorship
Thought suppression vector removes reasoning censorship