StyliTruth : Unlocking Stylized yet Truthful LLM Generation via Disentangled Steering

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

Style-guided generation suffers from *fidelity collapse*: existing representation-editing methods forcibly inject stylistic signals, corrupting the model’s fidelity-oriented representations and thereby degrading answer correctness. To address this, we propose an orthogonal decomposition mechanism that disentangles stylistic and fidelity representations. Specifically, we perform token-level orthogonal decorrelation within critical attention heads and introduce an adaptive, separable steering strategy—enabling, for the first time, dynamic decoupling and coordinated modulation of style and fidelity during generation. Our method significantly improves style adherence (+12.3% across multilingual and multi-style benchmarks) while preserving—and even enhancing—answer correctness (+4.1%). It outperforms state-of-the-art inference-time intervention approaches, establishing a new paradigm for faithful, controllable stylistic generation.

Technology Category

Application Category

📝 Abstract

Generating stylized large language model (LLM) responses via representation editing is a promising way for fine-grained output control. However, there exists an inherent trade-off: imposing a distinctive style often degrades truthfulness. Existing representation editing methods, by naively injecting style signals, overlook this collateral impact and frequently contaminate the model's core truthfulness representations, resulting in reduced answer correctness. We term this phenomenon stylization-induced truthfulness collapse. We attribute this issue to latent coupling between style and truth directions in certain key attention heads, and propose StyliTruth, a mechanism that preserves stylization while keeping truthfulness intact. StyliTruth separates the style-relevant and truth-relevant subspaces in the model's representation space via an orthogonal deflation process. This decomposition enables independent control of style and truth in their own subspaces, minimizing interference. By designing adaptive, token-level steering vectors within each subspace, we dynamically and precisely control the generation process to maintain both stylistic fidelity and truthfulness. We validate our method on multiple styles and languages. Extensive experiments and analyses show that StyliTruth significantly reduces stylization-induced truthfulness collapse and outperforms existing inference-time intervention methods in balancing style adherence with truthfulness.

Problem

Research questions and friction points this paper is trying to address.

Balancing style and truthfulness in LLM generation

Preventing truthfulness collapse during stylized output

Decoupling style and truth in representation space

Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal deflation separates style and truth subspaces

Adaptive token-level steering controls generation dynamically

Minimizes interference between style and truth representations

🔎 Similar Papers

Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories