Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses a previously overlooked phenomenon in large language models—“Echo-on-Prompt” (EOP)—where models spontaneously restate or echo the input prompt during early inference stages. The study formalizes EOP as a computable “Echo Likelihood Gap” and demonstrates its strong correlation with downstream task accuracy. To elucidate the underlying mechanism, the authors propose an attention refocusing framework that explains how EOP facilitates subsequent reasoning. Building on this insight, they introduce a suite of techniques—including Echo-Distilled Supervised Fine-Tuning (SFT), Echoic Prompting, and conditional modeling within rejection sampling—to encourage a “echo-then-reason” generation pattern. Evaluated on mathematical reasoning benchmarks such as GSM8K, MathQA, and MATH, the approach consistently improves performance under identical decoding configurations and computational budgets.

Technology Category

Application Category

📝 Abstract

Test-time compute allocation in large reasoning models (LRMs) is widely used and has applications in mathematical problem solving, code synthesis, and planning. Recent work has addressed this problem by scaling self-consistency and parallel thinking, adding generic ``thinking tokens''and prompting models to re-read the question before answering. Unfortunately, these approaches either inject task-agnostic tokens or mandate heuristics that do not explain -- and often ignore -- the \emph{spontaneous} repetition that many LRMs exhibit at the head of their internal chains. In contrast, we analyze and harness the model's tendency to restate the question, which we term the \emph{Echo of Prompt (EOP)}, as a front-loaded, compute-shaping mechanism. We formalize its probabilistic cost by casting echo removal as rejection-based conditioning and defining the \emph{Echo Likelihood Gap} $\Delta\mathcal{L}$ as a computable proxy. This provides the missing theoretical link that links early repetition to likelihood gains and downstream accuracy. However, it does not by itself specify how to exploit EOP. Consequently, we develop \emph{Echo-Distilled SFT (ED-SFT)} to instill an ``echo-then-reason''pattern through supervised finetuning, and \emph{Echoic Prompting (EP)} to re-ground the model mid-trace without training. While promising, quantifying benefits beyond verbosity is non-trivial. Therefore, we conduct length and suffix-controlled likelihood analyses together with layer-wise attention studies, showing that EOP increases answer to answer-prefix attention in middle layers, consistent with an \emph{attention refocusing} mechanism. We evaluate on GSM8K, MathQA, Hendrycks-MATH, AIME24, and MATH-500 under identical decoding settings and budgets, and find consistent gains over baselines. Code is available at https://github.com/hhh2210/echoes-as-anchors.

Problem

Research questions and friction points this paper is trying to address.

test-time compute allocation

large reasoning models

prompt repetition

spontaneous echoing

reasoning efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Echo of Prompt

Attention Refocusing

Probabilistic Conditioning