๐ค AI Summary
Existing approaches struggle to effectively characterize the reasoning dynamics of large language models during inference, as reliance on response length alone fails to capture the efficacy or stagnation patterns in chain-of-thought (CoT) reasoning. This work introduces recurrence quantification analysis (RQA) into the study of large model inference for the first time, modeling the trajectory of hidden states during token generation as a dynamical system to uncover repetitive and stagnant behaviors in reasoning. By offering a non-textual, latent-space dynamicsโbased perspective, the proposed method substantially outperforms conventional metrics. Experiments on 3,600 reasoning trajectories generated by DeepSeek-R1-Distill demonstrate that RQA-based measures improve task complexity prediction accuracy by 8%, effectively capturing reasoning dynamics that response length alone cannot reveal.
๐ Abstract
Test-time compute is central to large reasoning models, yet analysing their reasoning behaviour through generated text is increasingly impractical and unreliable. Response length is often used as a brute proxy for reasoning effort, but this metric fails to capture the dynamics and effectiveness of the Chain of Thoughts (CoT) or the generated tokens. We propose Recurrence Quantification Analysis (RQA) as a non-textual alternative for analysing model's reasoning chains at test time. By treating token generation as a dynamical system, we extract hidden embeddings at each generation step and apply RQA to the resulting trajectories. RQA metrics, including Determinism and Laminarity, quantify patterns of repetition and stalling in the model's latent representations. Analysing 3,600 generation traces from DeepSeek-R1-Distill, we show that RQA captures signals not reflected by response length, but also substantially improves prediction of task complexity by 8\%. These results help establish RQA as a principled tool for studying the latent token generation dynamics of test-time scaling in reasoning models.