S2LPP: Small-to-Large Prompt Prediction across LLMs

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Large language model (LLM) performance is highly sensitive to prompt template design, and manual prompt engineering incurs substantial computational and human effort. Method: We observe strong cross-model consistency in prompt preferences across LLMs of varying scales on diverse tasks. Leveraging this insight, we propose a novel “small-model-driven automatic prompt selection for large models” paradigm: training a lightweight prompt evaluator to predict high-performing prompts for target LLMs, with built-in support for cross-scale generalization. Our approach integrates multi-LLM comparative prompt preference analysis, zero- and few-shot prompt evaluation, and cross-model transfer strategies. Contribution/Results: Evaluated across 14 LLMs on question answering, natural language inference, and other tasks, our selected prompts achieve state-of-the-art performance while drastically reducing the computational overhead and manual labor traditionally required for prompt engineering.

Technology Category

Application Category

📝 Abstract

The performance of pre-trained Large Language Models (LLMs) is often sensitive to nuances in prompt templates, requiring careful prompt engineering, adding costs in terms of computing and human effort. In this study, we present experiments encompassing multiple LLMs variants of varying sizes aimed at probing their preference with different prompts. Through experiments on Question Answering, we show prompt preference consistency across LLMs of different sizes. We also show that this consistency extends to other tasks, such as Natural Language Inference. Utilizing this consistency, we propose a method to use a smaller model to select effective prompt templates for a larger model. We show that our method substantially reduces the cost of prompt engineering while consistently matching performance with optimal prompts among candidates. More importantly, our experiment shows the efficacy of our strategy across fourteen LLMs and its applicability to a broad range of NLP tasks, highlighting its robustness

Problem

Research questions and friction points this paper is trying to address.

LLM performance sensitivity to prompt templates

High cost of manual prompt engineering

Need for efficient cross-model prompt selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses smaller model for prompt selection

Ensures consistency across multiple LLMs

Reduces prompt engineering costs effectively

🔎 Similar Papers

Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints