🤖 AI Summary
This work addresses the limited interpretability and poor transferability of existing soft prompt tuning methods, particularly when applied to closed-source large language models. The authors propose the first end-to-end framework for translating optimized soft prompts into human-readable natural language (hard) prompts, leveraging a dedicated model trained across multiple datasets. Evaluated on several state-of-the-art datasets and DoD benchmarks, the method generates fluent and accurate hard prompts that substantially outperform zero-shot baselines such as InSPEcT and even surpass few-shot learning performance in certain settings. This approach represents the first successful demonstration of high-quality, transferable conversion between soft and hard prompts, significantly enhancing both the interpretability and practical utility of prompt engineering.
📝 Abstract
Soft prompt tuning is a parameter-efficient method for adapting LLMs to specific tasks, but suffers from a lack of interpretability. Building on recent work on interpreting soft prompts (Ramati et al., 2024), we explore how training a dedicated soft prompt to natural language translation model can yield higher translation quality. In particular, in both quantitative and qualitative comparisons on multiple Datasets of Datasets (DoDs), we demonstrate that our translator produces fluent, accurate verbalizations that outperforms existing training-free methods like InSPEcT. In addition to advancing interpretability, our work suggests a promising downstream application: soft prompts optimized on small, open-source models can be translated into portable text prompts that, when deployed on larger closed-API models, exceed the performance of the original soft prompt and, in some cases, even few-shot learning.