🤖 AI Summary
This work addresses the poor generalization and transferability of system prompts for large language models (LLMs) across diverse tasks. We propose the first bilevel optimization paradigm for system prompts: an upper-level optimization targeting robust, transferable system prompts, and a lower-level joint adaptation to heterogeneous user prompts. Our method adopts a MAML-style meta-learning framework, enabling zero-shot or few-shot adaptation to unseen tasks via collaborative multi-dataset training and iterative user-prompt refinement. Evaluated on 14 novel test sets spanning five domains, our approach significantly improves cross-domain generalization while drastically reducing the number of user-prompt tuning steps required at inference time. The core contributions are: (i) the formalization of system prompt optimization as a bilevel learning problem, and (ii) the design of a scalable meta-prompt learning framework that jointly optimizes system and user prompts for enhanced task-agnostic adaptability.
📝 Abstract
Large Language Models (LLMs) have shown remarkable capabilities, with optimizing their input prompts playing a pivotal role in maximizing their performance. However, while LLM prompts consist of both the task-agnostic system prompts and task-specific user prompts, existing work on prompt optimization has focused on user prompts specific to individual queries or tasks, and largely overlooked the system prompt that is, once optimized, applicable across different tasks and domains. Motivated by this, we introduce the novel problem of bilevel system prompt optimization, whose objective is to design system prompts that are robust to diverse user prompts and transferable to unseen tasks. To tackle this problem, we then propose a meta-learning framework, which meta-learns the system prompt by optimizing it over various user prompts across multiple datasets, while simultaneously updating the user prompts in an iterative manner to ensure synergy between them. We conduct experiments on 14 unseen datasets spanning 5 different domains, on which we show that our approach produces system prompts that generalize effectively to diverse user prompts. Also, our findings reveal that the optimized system prompt enables rapid adaptation even to unseen tasks, requiring fewer optimization steps for test-time user prompts while achieving improved performance.