🤖 AI Summary
Designing effective prompts to fully harness the capabilities of large language models (LLMs) remains challenging for users, limiting their practical performance. To address this, we propose an end-to-end reinforcement learning framework that enables collaborative reasoning between small- and large-scale language models: a lightweight model generates high-quality prompts, which are then executed by a larger LLM for complex reasoning. We formalize this collaboration as a multi-turn prompt interaction process and introduce a dual-constraint reward mechanism that jointly optimizes for both answer correctness and reasoning quality. The framework is modular and plug-and-play, supporting flexible combinations of diverse LLMs. Extensive experiments on multiple public benchmarks demonstrate significant improvements over strong baselines, validating its effectiveness, generalizability across tasks and domains, and cross-model compatibility.
📝 Abstract
Recently, advanced large language models (LLMs) have emerged at an increasingly rapid pace. However, when faced with complex problems, most users are often unable to provide accurate and effective prompts to interact with LLMs, thus limiting the performance of LLMs. To address this challenge, we propose Prompt-R1, an end-to-end reinforcement learning framework that uses a small-scale LLM to collaborate with large-scale LLMs, replacing user interaction to solve problems better. This collaboration is cast as a multi-turn prompt interaction, where the small-scale LLM thinks and generates prompts, and the large-scale LLM performs complex reasoning. A dual-constrained reward is designed to optimize for correctness, generation quality, and reasoning accuracy. Prompt-R1 provides a plug-and-play framework that supports both inference and training with various large-scale LLMs. Experiments on multiple public datasets show that Prompt-R1 significantly outperforms baseline models across tasks. Our code is publicly available at https://github.com/QwenQKing/Prompt-R1.