🤖 AI Summary
This work addresses a key limitation in traditional multi-turn chain-of-thought (CoT) reasoning, which focuses solely on final answers while neglecting the structural properties of intermediate reasoning paths. The authors propose SliceGraph, a novel framework that constructs mutual k-nearest-neighbor graphs based on Jaccard similarity of sparsely activated keys and models reasoning trajectories as question–model–unit graphs. Introducing the concept of “process isomers” for the first time, the study reveals that identical correct answers can emerge from multiple reasoning families that share strategic consistency yet differ in their inference pathways. Through biconnected component analysis, label-guided reward field modeling, and typed state transition kernels, the authors identify process isomers in 85.5% of 954 question–model units, with 76.6% of correct trajectory pairs within multi-trajectory units belonging to distinct reasoning families—demonstrating the prevalence and robustness of multi-path reasoning structures.
📝 Abstract
Multi-run chain-of-thought reasoning is usually collapsed to final-answer aggregates, which discard howsampled trajectories share, split, and rejoin through intermediate computation. We propose SliceGraph, a post-hoc problem-model-cell graph built by mutual-kNN over sparse activation-key Jaccard similarity between CoT slices, and treat it as a measurement object for process geometry rather than as a decoding program. Across sampled CoT ensembles from three primary 4B/8B models on math and science benchmarks, blinded annotation supports SliceGraph biconnected components as shared reasoning-state units and process families as within-family strategy-coherent route units. In 85.5% of 954 problem-model cells, correct CoTs sharing the same normalized answer split into multiple process families; among cells with at least two such runs, 76.6% of run pairs are cross-family on average. We call such same-answer, family-divergent correct trajectories process isomers. A label-seeded reward field provides a separate value-landscape layer: success-associated regions often split into disconnected high-value cores, and route families specialize over these core footprints rather than merely duplicating one another. A typed-state transition analysis further shows that process families navigate the same atlas with distinct transition kernels under matched null controls. Representation ablations, a cross-architecture replication, and two cross-scale replications support the robustness of the route-family scaffold, showing that final-answer aggregation overlooks this structured multi-route process geometry.