🤖 AI Summary
Industrial-scale official account platforms demand dialogue agents that simultaneously ensure contextual relevance, stylistic consistency, and low-latency deployment—challenges unmet by existing approaches. This paper proposes WeStar, a unified framework addressing these requirements. First, it introduces a clustering-based multi-dimensional parameter sharing mechanism to enable lightweight modeling of stylistic clusters. Second, it proposes Style-enhanced Direct Preference Optimization (Style-DPO), jointly optimizing stylistic fidelity and response quality. Third, it integrates retrieval-augmented generation (RAG) with parametric RAG (PRAG), where LoRA modules dynamically activate style-specific clusters to balance retrieval enhancement and efficient adaptation. Evaluated on a large-scale industrial dataset, WeStar significantly reduces inference latency and computational overhead, enabling personalized responses for over one million accounts. It maintains high contextual relevance while achieving strong stylistic consistency, demonstrating practical deployability in real-world scenarios.
📝 Abstract
Conversational agents deployed in industrial-scale official account platforms must generate responses that are both contextually grounded and stylistically aligned-requirements that existing methods struggle to meet. Chain-of-thought (CoT) prompting induces significant latency due to multi-turn reasoning; per-account fine-tuning is computationally prohibitive; and long prompt-based methods degrade the model's ability to grasp injected context and style. In this paper, we propose WeStar, a lite-adaptive framework for stylized contextual question answering that scales to millions of official accounts. WeStar combines context-grounded generation via RAG with style-aware generation using Parametric RAG (PRAG), where LoRA modules are dynamically activated per style cluster. Our contributions are fourfold: (1) We introduce WeStar, a unified framework capable of serving large volumes of official accounts with minimal overhead. (2) We propose a multi-dimensional, cluster-based parameter sharing scheme that enables compact style representation while preserving stylistic diversity. (3) We develop a style-enhanced Direct Preference Optimization (SeDPO) method to optimize each style cluster's parameters for improved generation quality. (4) Experiments on a large-scale industrial dataset validate the effectiveness and efficiency of WeStar, underscoring its pracitical value in real-world deployment.