Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the information bottleneck induced by scalar reward signals in preference learning. We propose a structured text-feedback-based directional optimization framework operating directly in textual output spaces. Methodologically, we leverage pairwise comparisons and large language model (LLM) in-context learning to convert fine-grained textual critiques into gradient-like editing directions, enabling iterative, parameter-free, inference-time optimization. Our key contribution lies in preserving high-bandwidth textual feedback—bypassing lossy scalar quantization—and performing direct optimization over diverse textual artifacts, including prompts, code, and molecular SMILES strings. Experiments demonstrate consistent and significant improvements over state-of-the-art baselines across prompt engineering, reinforcement learning benchmarks, and graph-structured molecular optimization. Notably, on the DOCKSTRING benchmark, our approach discovers novel drug-like molecules ranking at the 99.9th percentile—among over 260,000 compounds—in docking affinity.

Technology Category

Application Category

📝 Abstract

We introduce extit{Feedback Descent}, a framework that optimizes text artifacts -- prompts, code, and molecules -- through structured textual feedback, rather than relying solely on scalar rewards. By preserving detailed critiques instead of compressing them to binary preferences, Feedback Descent widens the information bottleneck in preference learning, enabling directed optimization in text space rather than weight space. We show that in-context learning can transform structured feedback into gradient-like directional information, enabling targeted edits. Unlike prior approaches that collapse judgments into single bits, our evaluators pair each comparison with textual feedback, which functions as high-bandwidth supervision. The iteration loop is done purely at inference time, without modifying any model weights, and is task-agnostic. We evaluate Feedback Descent on three diverse domains and find that it outperforms state-of-the-art prompt optimization (GEPA), reinforcement learning methods (GRPO, REINVENT), and even specialized graph-based molecular optimizers. In the DOCKSTRING molecule discovery benchmark, Feedback Descent identifies novel drug-like molecules surpassing the $99.9$th percentile of a database with more than $260{,}000$ compounds across six protein targets.

Problem

Research questions and friction points this paper is trying to address.

Optimizes text artifacts through structured textual feedback

Uses in-context learning to transform feedback into directional edits

Performs task-agnostic optimization without modifying model weights

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes text via structured feedback instead of scalar rewards

Uses textual critiques as high-bandwidth supervision for edits

Performs iteration purely at inference without weight modification

🔎 Similar Papers

No similar papers found.