Provably Learning from Language Feedback

📅 2025-06-12

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Interactive learning from natural language feedback (LLF) lacks a formal theoretical foundation. Method: We first formalize the LLF problem and characterize its learnability conditions. We introduce the *transfer eluder dimension* as a novel complexity measure, proving that language feedback enables exponential sample-efficiency gains. We propose HELiX—the first no-regret algorithm for LLF with theoretical guarantees—integrating LLM-based feedback modeling, uncertainty quantification, and sequential policy optimization. Our theoretical analysis extends the eluder dimension framework to establish policy convergence and sublinear regret bounds. Results: Experiments across diverse domains demonstrate HELiX’s stable convergence and consistent superiority over heuristic prompting baselines, empirically validating the practical efficacy of our theoretical guarantees.

Technology Category

Application Category

📝 Abstract

Interactively learning from observation and language feedback is an increasingly studied area driven by the emergence of large language model (LLM) agents. While impressive empirical demonstrations have been shown, so far a principled framing of these decision problems remains lacking. In this paper, we formalize the Learning from Language Feedback (LLF) problem, assert sufficient assumptions to enable learning despite latent rewards, and introduce $ extit{transfer eluder dimension}$ as a complexity measure to characterize the hardness of LLF problems. We show that transfer eluder dimension captures the intuition that information in the feedback changes the learning complexity of the LLF problem. We demonstrate cases where learning from rich language feedback can be exponentially faster than learning from reward. We develop a no-regret algorithm, called $ exttt{HELiX}$, that provably solves LLF problems through sequential interactions, with performance guarantees that scale with the transfer eluder dimension of the problem. Across several empirical domains, we show that $ exttt{HELiX}$ performs well even when repeatedly prompting LLMs does not work reliably. Our contributions mark a first step towards designing principled interactive learning algorithms from generic language feedback.

Problem

Research questions and friction points this paper is trying to address.

Formalizing Learning from Language Feedback (LLF) problems

Introducing transfer eluder dimension for LLF complexity

Developing no-regret algorithm HELiX for LLF

Innovation

Methods, ideas, or system contributions that make the work stand out.

Formalize Learning from Language Feedback problem

Introduce transfer eluder dimension complexity measure

Develop no-regret HELiX algorithm for LLF

🔎 Similar Papers

Can Feedback Enhance Semantic Grounding in Large Vision-Language Models?