🤖 AI Summary
This paper challenges the theoretical foundations and explanatory power of “text gradient”-based automated prompt optimization methods, which metaphorically equate discrete text updates with continuous, differentiable gradient descent. Method: Through systematic LLM prompt fine-tuning experiments, multi-task comparative analysis, ablation studies, and behavioral attribution, we rigorously examine whether these methods operate as genuine gradient-based optimizers. Contribution/Results: We demonstrate that performance gains are not attributable to gradient update logic; instead, “text gradients” function merely as empirical heuristics without theoretical grounding in differentiable optimization. First, we formally establish their non-gradient nature. Second, we propose a novel conceptual framework for prompt optimization explicitly tailored to discrete text spaces. Third, we advocate shifting prompt engineering from analogical transfer (e.g., borrowing optimization metaphors from continuous domains) toward intrinsic, ontology-aware modeling. These findings call for a fundamental methodological rethinking of prompt optimization.
📝 Abstract
A well-engineered prompt can increase the performance of large language models; automatic prompt optimization techniques aim to increase performance without requiring human effort to tune the prompts. One leading class of prompt optimization techniques introduces the analogy of textual gradients. We investigate the behavior of these textual gradient methods through a series of experiments and case studies. While such methods often result in a performance improvement, our experiments suggest that the gradient analogy does not accurately explain their behavior. Our insights may inform the selection of prompt optimization strategies, and development of new approaches.