Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work proposes TRACED, a novel framework that addresses the limitations of scalar-probability-based evaluation methods in capturing the structural dynamics and reliability of large language model (LLM) reasoning. By introducing geometric dynamics into LLM reasoning analysis for the first time, TRACED models reasoning trajectories along two geometric dimensions—“progress” (displacement) and “stability” (curvature)—revealing fundamental topological distinctions between correct reasoning and hallucination. The framework establishes cognitive mappings between “hesitation loops” and high curvature, as well as between “certainty accumulation” and displacement, offering a new physical perspective on machine cognition. Experimental results demonstrate that TRACED achieves strong performance and robustness across multiple benchmarks, effectively discriminating between correct reasoning and hallucinatory behavior.

Technology Category

Application Category

📝 Abstract

Evaluating LLM reliability via scalar probabilities often fails to capture the structural dynamics of reasoning. We introduce TRACED, a framework that assesses reasoning quality through theoretically grounded geometric kinematics. By decomposing reasoning traces into Progress (displacement) and Stability (curvature), we reveal a distinct topological divergence: correct reasoning manifests as high-progress, stable trajectories, whereas hallucinations are characterized by low-progress, unstable patterns (stalled displacement with high curvature fluctuations). Leveraging these signatures, our probabilistic framework achieves competitive performance and superior robustness across diverse benchmarks. Crucially, TRACED bridges geometry and cognition by mapping high curvature to''Hesitation Loops''and displacement to''Certainty Accumulation'', offering a physical lens to decode the internal dynamics of machine thought.

Problem

Research questions and friction points this paper is trying to address.

LLM reasoning

reasoning evaluation

structural dynamics

hallucination detection

reasoning reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

geometric reasoning

reasoning stability

progress trajectory