TRPrompt: Bootstrapping Query-Aware Prompt Optimization from Textual Rewards

📅 2025-07-24

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the challenge of enhancing large language models’ (LLMs) reasoning capabilities **without parameter updates**. To this end, we propose TRPrompt—a framework that trains a lightweight prompt model directly using **high-resolution textual feedback signals**, enabling query-aware, iterative prompt optimization. Its core contribution is the first integration of fine-grained textual feedback into prompt modeling, thereby **unifying feedback-driven and reward-driven paradigms**, and supporting bootstrapped, dataset-free optimization. TRPrompt incorporates a textual reward mechanism, query-aware prompt modeling, and LLM-internalized learning to establish a closed-loop feedback training system. On challenging mathematical reasoning benchmarks—GSMHard and MATH—TRPrompt generates query-specific prompts that substantially outperform existing training-free methods, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

Prompt optimization improves the reasoning abilities of large language models (LLMs) without requiring parameter updates to the target model. Following heuristic-based "Think step by step" approaches, the field has evolved in two main directions: while one group of methods uses textual feedback to elicit improved prompts from general-purpose LLMs in a training-free way, a concurrent line of research relies on numerical rewards to train a special prompt model, tailored for providing optimal prompts to the target model. In this paper, we introduce the Textual Reward Prompt framework (TRPrompt), which unifies these approaches by directly incorporating textual feedback into training of the prompt model. Our framework does not require prior dataset collection and is being iteratively improved with the feedback on the generated prompts. When coupled with the capacity of an LLM to internalize the notion of what a "good" prompt is, the high-resolution signal provided by the textual rewards allows us to train a prompt model yielding state-of-the-art query-specific prompts for the problems from the challenging math datasets GSMHard and MATH.

Problem

Research questions and friction points this paper is trying to address.

Optimizing prompts for LLMs without parameter updates

Unifying textual feedback and numerical reward approaches

Generating query-specific prompts for math problem-solving

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unifies textual feedback and prompt model training

Iteratively improves prompts without prior datasets

Uses LLM internalized notion for optimal prompts

🔎 Similar Papers

QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning