π€ AI Summary
To address the prevalent issue of insufficient response alignment in open-source large language models (LLMs) due to a lack of high-quality, diverse system messages, this paper proposes SysGenβthe first framework for instruction-aligned system message generation without human annotation. Methodologically, SysGen employs reverse prompt engineering on instruction-response pairs, integrated with controllable text generation, multi-stage fine-tuning, and diversity-enhanced sampling to support adaptation across multiple roles, formats, and stylistic preferences. On the Multifacet benchmark, SysGen achieves significant improvements in response alignment; on unseen evaluation benchmarks such as Open LLM Leaderboard 2, it preserves near-original general capabilities. Qualitative analysis further demonstrates that system message diversity is critical for cross-scenario generalization. This work establishes a scalable, low-cost paradigm for constructing highly adaptive system messages, advancing practical deployment of open-source LLMs.
π Abstract
System messages play a crucial role in interactions with large language models (LLMs), often serving as prompts to initiate conversations. Through system messages, users can assign specific roles, perform intended tasks, incorporate background information, specify various output formats and communication styles. Despite such versatility, publicly available data are often lack system messages and subject to strict license constraints in the industry field. Manual labeling of publicly available data with system messages that align with user instructions demands significant resources. In view of such challenges, our work introduces SysGen, a pipeline for generating system messages with better aligned assistant responses from the supervised fine-tuning dataset without system messages. Training on SysGen data has demonstrated substantial improvements in the alignment of model responses with system messages and user instructions, as demonstrated across various open-source models on the Multifacet benchmark, while maintaining minimal impact on other unseen benchmarks such as Open LLM Leaderboard 2. Our qualitative analysis highlights the importance of diverse system messages to ensure better adaptability across different contexts.