HT-Transformer: Event Sequences Classification by Accumulating Prefix Information with History Tokens

📅 2025-08-02

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Transformers often underperform relative to RNNs in event sequence classification—particularly future target prediction—due to the absence of a global state vector and weakened local contextual modeling caused by contrastive pretraining embeddings. Method: We propose the Historical Token mechanism: during Transformer pretraining, we explicitly introduce learnable, cumulative historical tokens that dynamically aggregate prefix information, yielding compact, globally aware sequence representations while preserving fine-grained temporal sensitivity. Our approach builds upon the standard Transformer architecture and jointly optimizes contrastive learning with next-token prediction. Contribution/Results: Evaluated across financial risk assessment, e-commerce behavior forecasting, and clinical event classification, our method consistently outperforms state-of-the-art models, achieving average classification accuracy gains of 3.2–5.7%. These results empirically validate the critical importance of explicit historical state modeling for sequential prediction tasks.

Technology Category

Application Category

📝 Abstract

Deep learning has achieved remarkable success in modeling sequential data, including event sequences, temporal point processes, and irregular time series. Recently, transformers have largely replaced recurrent networks in these tasks. However, transformers often underperform RNNs in classification tasks where the objective is to predict future targets. The reason behind this performance gap remains largely unexplored. In this paper, we identify a key limitation of transformers: the absence of a single state vector that provides a compact and effective representation of the entire sequence. Additionally, we show that contrastive pretraining of embedding vectors fails to capture local context, which is crucial for accurate prediction. To address these challenges, we introduce history tokens, a novel concept that facilitates the accumulation of historical information during next-token prediction pretraining. Our approach significantly improves transformer-based models, achieving impressive results in finance, e-commerce, and healthcare tasks. The code is publicly available on GitHub.

Problem

Research questions and friction points this paper is trying to address.

Transformers lack compact sequence representation for classification tasks

Contrastive pretraining fails to capture crucial local context

Need to accumulate historical information in next-token prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces history tokens for information accumulation

Enhances transformer models with compact state representation

Improves local context capture in pretraining

🔎 Similar Papers

No similar papers found.