One Agent to Serve All: a Lite-Adaptive Stylized AI Assistant for Millions of Multi-Style Official Accounts

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

Industrial-scale official account platforms demand dialogue agents that simultaneously ensure contextual relevance, stylistic consistency, and low-latency deployment—challenges unmet by existing approaches. This paper proposes WeStar, a unified framework addressing these requirements. First, it introduces a clustering-based multi-dimensional parameter sharing mechanism to enable lightweight modeling of stylistic clusters. Second, it proposes Style-enhanced Direct Preference Optimization (Style-DPO), jointly optimizing stylistic fidelity and response quality. Third, it integrates retrieval-augmented generation (RAG) with parametric RAG (PRAG), where LoRA modules dynamically activate style-specific clusters to balance retrieval enhancement and efficient adaptation. Evaluated on a large-scale industrial dataset, WeStar significantly reduces inference latency and computational overhead, enabling personalized responses for over one million accounts. It maintains high contextual relevance while achieving strong stylistic consistency, demonstrating practical deployability in real-world scenarios.

Technology Category

Application Category

📝 Abstract

Conversational agents deployed in industrial-scale official account platforms must generate responses that are both contextually grounded and stylistically aligned-requirements that existing methods struggle to meet. Chain-of-thought (CoT) prompting induces significant latency due to multi-turn reasoning; per-account fine-tuning is computationally prohibitive; and long prompt-based methods degrade the model's ability to grasp injected context and style. In this paper, we propose WeStar, a lite-adaptive framework for stylized contextual question answering that scales to millions of official accounts. WeStar combines context-grounded generation via RAG with style-aware generation using Parametric RAG (PRAG), where LoRA modules are dynamically activated per style cluster. Our contributions are fourfold: (1) We introduce WeStar, a unified framework capable of serving large volumes of official accounts with minimal overhead. (2) We propose a multi-dimensional, cluster-based parameter sharing scheme that enables compact style representation while preserving stylistic diversity. (3) We develop a style-enhanced Direct Preference Optimization (SeDPO) method to optimize each style cluster's parameters for improved generation quality. (4) Experiments on a large-scale industrial dataset validate the effectiveness and efficiency of WeStar, underscoring its pracitical value in real-world deployment.

Problem

Research questions and friction points this paper is trying to address.

Generating contextually grounded and stylistically aligned responses for official accounts

Reducing computational latency and costs of multi-turn reasoning approaches

Overcoming degradation of context understanding in long prompt-based methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic LoRA activation for style clusters

Parametric RAG combining context and style

Style-enhanced DPO for generation optimization

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Senior AI & Generative AI Specialist

Bosch Group

Bengaluru, KA, IN

Research Engineer, Monetization AI