Prompt-R1: Collaborative Automatic Prompting Framework via End-to-end Reinforcement Learning

📅 2025-11-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Designing effective prompts to fully harness the capabilities of large language models (LLMs) remains challenging for users, limiting their practical performance. To address this, we propose an end-to-end reinforcement learning framework that enables collaborative reasoning between small- and large-scale language models: a lightweight model generates high-quality prompts, which are then executed by a larger LLM for complex reasoning. We formalize this collaboration as a multi-turn prompt interaction process and introduce a dual-constraint reward mechanism that jointly optimizes for both answer correctness and reasoning quality. The framework is modular and plug-and-play, supporting flexible combinations of diverse LLMs. Extensive experiments on multiple public benchmarks demonstrate significant improvements over strong baselines, validating its effectiveness, generalizability across tasks and domains, and cross-model compatibility.

Technology Category

Application Category

📝 Abstract
Recently, advanced large language models (LLMs) have emerged at an increasingly rapid pace. However, when faced with complex problems, most users are often unable to provide accurate and effective prompts to interact with LLMs, thus limiting the performance of LLMs. To address this challenge, we propose Prompt-R1, an end-to-end reinforcement learning framework that uses a small-scale LLM to collaborate with large-scale LLMs, replacing user interaction to solve problems better. This collaboration is cast as a multi-turn prompt interaction, where the small-scale LLM thinks and generates prompts, and the large-scale LLM performs complex reasoning. A dual-constrained reward is designed to optimize for correctness, generation quality, and reasoning accuracy. Prompt-R1 provides a plug-and-play framework that supports both inference and training with various large-scale LLMs. Experiments on multiple public datasets show that Prompt-R1 significantly outperforms baseline models across tasks. Our code is publicly available at https://github.com/QwenQKing/Prompt-R1.
Problem

Research questions and friction points this paper is trying to address.

Addresses users' inability to provide accurate prompts for complex problems with LLMs
Proposes collaborative framework using small LLM to generate prompts for large LLMs
Optimizes prompt generation for correctness, quality and reasoning accuracy via reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Small LLM collaborates with large LLMs via reinforcement learning
Uses multi-turn prompt interaction for complex reasoning tasks
Optimizes prompts with dual-constrained reward for accuracy
🔎 Similar Papers
No similar papers found.