🤖 AI Summary
In edge- and on-premises-deployed, goal-oriented customer service dialogue scenarios, existing large language model (LLM) solutions struggle to balance performance, controllability, and cost—proprietary models (e.g., GPT-4) incur high licensing fees and lack self-hosting capability, while open-source lightweight models suffer from insufficient capability.
Method: We propose “policy distillation,” a novel black-box, interpretable knowledge transfer paradigm comprising two stages: scenario-aware generation and policy optimization. It constructs an auditable, transferable prompt policy library—eliminating reliance on parameter fine-tuning or response imitation—by integrating black-box API invocation, scenario-driven policy induction, and automated prompt engineering.
Contribution/Results: Experiments demonstrate substantial improvement in user satisfaction for lightweight LLMs on customer service tasks; the distilled policies exhibit strong generalization across models and tasks; and built-in human review support enhances safety and operational controllability.
📝 Abstract
Advanced Large language models (LLMs) like GPT-4 or LlaMa 3 provide superior performance in complex human-like interactions. But they are costly, or too large for edge devices such as smartphones and harder to self-host, leading to security and privacy concerns. This paper introduces a novel interpretable knowledge distillation approach to enhance the performance of smaller, more economical LLMs that firms can self-host. We study this problem in the context of building a customer service agent aimed at achieving high customer satisfaction through goal-oriented dialogues. Unlike traditional knowledge distillation, where the"student"model learns directly from the"teacher"model's responses via fine-tuning, our interpretable"strategy"teaching approach involves the teacher providing strategies to improve the student's performance in various scenarios. This method alternates between a"scenario generation"step and a"strategies for improvement"step, creating a customized library of scenarios and optimized strategies for automated prompting. The method requires only black-box access to both student and teacher models; hence it can be used without manipulating model parameters. In our customer service application, the method improves performance, and the learned strategies are transferable to other LLMs and scenarios beyond the training set. The method's interpretabilty helps safeguard against potential harms through human audit.