Latent Chain-of-Thought Improves Structured-Data Transformers

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the limited reasoning capability of Transformers on structured data—such as time series and tabular datasets—by introducing an implicit chain-of-thought mechanism. Following an initial forward pass, the hidden states at query positions are compressed into feedback tokens and iteratively reinjected into the model, leveraging weight-sharing recursive computation to enhance test-time reasoning. This approach represents the first adaptation of chain-of-thought principles to structured data domains. Evaluated across 36 benchmark datasets, it significantly outperforms baseline methods: achieving an average improvement of 10.99% on time series tasks (winning in 8 out of 9 benchmarks) and 5.31% on tabular tasks (winning in 22 out of 27 benchmarks).

📝 Abstract

Chain-of-thought and more broadly test-time compute are known to augment the expressive capabilities of language models and have led to major innovations in reasoning. Motivated by this success, this paper explores latent chain-of-thought as well as the impact of depth and looping for time-series and tabular data. We propose a recurrent scheme in which a structured-data transformer, after an initial forward pass, compresses its query-position hidden states into feedback tokens that are appended to the input and processed again, allowing multiple rounds of latent computation before prediction. We compare CoT models against a same-depth no-CoT baseline, a deeper baseline matched to the CoT model in effective depth, and a looped transformer with weight-tied recurrence but no additional chain-of-thought tokens. Across 36 datasets in time-series forecasting and tabular prediction, latent chain-of-thought improves over the baseline on 8/9 time-series datasets (+10.99\% average gain) and 22/27 tabular datasets (+5.31\% average gain). Across both settings, the CoT models perform the best on average. These results demonstrate that chain-of-thought is a useful axis for scaling test-time compute for structured data.

Problem

Research questions and friction points this paper is trying to address.

structured data

chain-of-thought

time-series forecasting

tabular prediction

test-time compute

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent chain-of-thought

structured-data transformers

test-time compute