Concept Distillation from Strong to Weak Models via Hypotheses-to-Theories Prompting

๐Ÿ“… 2024-08-18
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the poor prompting performance of weak language models on complex tasks and the high cost of manual prompt engineering, this paper proposes Concept Distillation (CD), an error-driven automatic prompt optimization framework that requires neither fine-tuning nor gradient updates. CD analyzes errors made by weak models to guide stronger models in performing inductive reasoning, thereby generating interpretable and transferable logical rules; these rules are rigorously validated and then injected into prompts. CD introduces the novel โ€œHypothesis โ†’ Theoryโ€ prompting paradigm, integrating inductive and deductive reasoning stages to jointly enhance generalizability and reliability. On Multi-Arith, Mistral-7B achieves a 20% accuracy gain; on HumanEval, Phi-3-mini-3.8B improves by 34%. CD significantly outperforms existing automated prompting methods and enables zero-fine-tuning, seamless cross-model transfer.

Technology Category

Application Category

๐Ÿ“ Abstract
Hand-crafting high quality prompts to optimize the performance of language models is a complicated and labor-intensive process. Furthermore, when migrating to newer, smaller, or weaker models (possibly due to latency or cost gains), prompts need to be updated to re-optimize the task performance. We propose Concept Distillation (CD), an automatic prompt optimization technique for enhancing weaker models on complex tasks. CD involves: (1) collecting mistakes made by weak models with a base prompt (initialization), (2) using a strong model to generate reasons for these mistakes and create rules/concepts for weak models (induction), and (3) filtering these rules based on validation set performance and integrating them into the base prompt (deduction/verification). We evaluated CD on NL2Code and mathematical reasoning tasks, observing significant performance boosts for small and weaker language models. Notably, Mistral-7B's accuracy on Multi-Arith increased by 20%, and Phi-3-mini-3.8B's accuracy on HumanEval rose by 34%. Compared to other automated methods, CD offers an effective, cost-efficient strategy for improving weak models' performance on complex tasks and enables seamless workload migration across different language models without compromising performance.
Problem

Research questions and friction points this paper is trying to address.

Automates prompt optimization for weaker models
Enhances performance on complex tasks efficiently
Facilitates seamless migration across language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated prompt optimization technique
Strong model generates error explanations
Rules filtered and integrated into prompts
๐Ÿ”Ž Similar Papers
No similar papers found.