🤖 AI Summary
In multi-step reasoning with large language models (LLMs), user feedback often suffers from imprecise error localization and unreliable integration, leading to uncorrected or newly introduced errors. To address this, we propose **in-situ feedback**, a novel paradigm wherein users directly edit the model’s output, and the model generates a revised response conditioned on the edited text—enabling precise error localization and localized correction. This is the first approach to treat human edits as explicit feedback signals, formalized via conditional generation to avoid error propagation and context confusion inherent in conventional multi-turn dialogue. Evaluated on multiple challenging reasoning benchmarks, our method achieves substantial accuracy gains over strong baselines while reducing token consumption by 79.1%, significantly improving inference efficiency, response consistency, and interactive controllability.
📝 Abstract
Large language models (LLMs) are increasingly studied in the context of multi-turn reasoning, where models iteratively refine their outputs based on user-provided feedback. Such settings are crucial for tasks that require complex reasoning, yet existing feedback paradigms often rely on issuing new messages. LLMs struggle to integrate these reliably, leading to inconsistent improvements. In this work, we introduce in-place feedback, a novel interaction paradigm in which users directly edit an LLM's previous response, and the model conditions on this modified response to generate its revision. Empirical evaluations on diverse reasoning-intensive benchmarks reveal that in-place feedback achieves better performance than conventional multi-turn feedback while using $79.1%$ fewer tokens. Complementary analyses on controlled environments further demonstrate that in-place feedback resolves a core limitation of multi-turn feedback: models often fail to apply feedback precisely to erroneous parts of the response, leaving errors uncorrected and sometimes introducing new mistakes into previously correct content. These findings suggest that in-place feedback offers a more natural and effective mechanism for guiding LLMs in reasoning-intensive tasks.