Latent Chain-of-Thought Improves Structured-Data Transformers

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

178K/year
🤖 AI Summary
This work addresses the limited reasoning capability of Transformers on structured data—such as time series and tabular datasets—by introducing an implicit chain-of-thought mechanism. Following an initial forward pass, the hidden states at query positions are compressed into feedback tokens and iteratively reinjected into the model, leveraging weight-sharing recursive computation to enhance test-time reasoning. This approach represents the first adaptation of chain-of-thought principles to structured data domains. Evaluated across 36 benchmark datasets, it significantly outperforms baseline methods: achieving an average improvement of 10.99% on time series tasks (winning in 8 out of 9 benchmarks) and 5.31% on tabular tasks (winning in 22 out of 27 benchmarks).
📝 Abstract
Chain-of-thought and more broadly test-time compute are known to augment the expressive capabilities of language models and have led to major innovations in reasoning. Motivated by this success, this paper explores latent chain-of-thought as well as the impact of depth and looping for time-series and tabular data. We propose a recurrent scheme in which a structured-data transformer, after an initial forward pass, compresses its query-position hidden states into feedback tokens that are appended to the input and processed again, allowing multiple rounds of latent computation before prediction. We compare CoT models against a same-depth no-CoT baseline, a deeper baseline matched to the CoT model in effective depth, and a looped transformer with weight-tied recurrence but no additional chain-of-thought tokens. Across 36 datasets in time-series forecasting and tabular prediction, latent chain-of-thought improves over the baseline on 8/9 time-series datasets (+10.99\% average gain) and 22/27 tabular datasets (+5.31\% average gain). Across both settings, the CoT models perform the best on average. These results demonstrate that chain-of-thought is a useful axis for scaling test-time compute for structured data.
Problem

Research questions and friction points this paper is trying to address.

structured data
chain-of-thought
time-series forecasting
tabular prediction
test-time compute
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent chain-of-thought
structured-data transformers
test-time compute
recurrent feedback tokens
tabular and time-series prediction