🤖 AI Summary
Style-guided generation suffers from *fidelity collapse*: existing representation-editing methods forcibly inject stylistic signals, corrupting the model’s fidelity-oriented representations and thereby degrading answer correctness. To address this, we propose an orthogonal decomposition mechanism that disentangles stylistic and fidelity representations. Specifically, we perform token-level orthogonal decorrelation within critical attention heads and introduce an adaptive, separable steering strategy—enabling, for the first time, dynamic decoupling and coordinated modulation of style and fidelity during generation. Our method significantly improves style adherence (+12.3% across multilingual and multi-style benchmarks) while preserving—and even enhancing—answer correctness (+4.1%). It outperforms state-of-the-art inference-time intervention approaches, establishing a new paradigm for faithful, controllable stylistic generation.
📝 Abstract
Generating stylized large language model (LLM) responses via representation editing is a promising way for fine-grained output control. However, there exists an inherent trade-off: imposing a distinctive style often degrades truthfulness. Existing representation editing methods, by naively injecting style signals, overlook this collateral impact and frequently contaminate the model's core truthfulness representations, resulting in reduced answer correctness. We term this phenomenon stylization-induced truthfulness collapse. We attribute this issue to latent coupling between style and truth directions in certain key attention heads, and propose StyliTruth, a mechanism that preserves stylization while keeping truthfulness intact. StyliTruth separates the style-relevant and truth-relevant subspaces in the model's representation space via an orthogonal deflation process. This decomposition enables independent control of style and truth in their own subspaces, minimizing interference. By designing adaptive, token-level steering vectors within each subspace, we dynamically and precisely control the generation process to maintain both stylistic fidelity and truthfulness. We validate our method on multiple styles and languages. Extensive experiments and analyses show that StyliTruth significantly reduces stylization-induced truthfulness collapse and outperforms existing inference-time intervention methods in balancing style adherence with truthfulness.