🤖 AI Summary
Existing Transformer-based marked temporal point process (MTPP) approaches rely solely on positional encoding to incorporate temporal information, which limits their ability to capture event-type-specific heterogeneous temporal dynamics. To address this limitation, this work proposes Hawkes Attention, a novel mechanism that deeply integrates the theory of multivariate Hawkes processes with the attention architecture. Specifically, it introduces learnable type-specific neural kernels that dynamically modulate the projections of queries, keys, and values, thereby jointly modeling the interaction between event content and temporal dynamics. The proposed method achieves significant performance gains over current baselines on MTPP tasks and naturally extends to predictive tasks involving complex temporal structures.
📝 Abstract
Marked Temporal Point Processes (MTPPs) arise naturally in medical, social, commercial, and financial domains. However, existing Transformer-based methods mostly inject temporal information only via positional encodings, relying on shared or parametric decay structures, which limits their ability to capture heterogeneous and type-specific temporal effects. Inspired by this observation, we derive a novel attention operator called Hawkes Attention from the multivariate Hawkes process theory for MTPP, using learnable per-type neural kernels to modulate query, key and value projections, thereby replacing the corresponding parts in the traditional attention. Benefited from the design, Hawkes Attention unifies event timing and content interaction, learning both the time-relevant behavior and type-specific excitation patterns from the data. The experimental results show that our method achieves better performance compared to the baselines. In addition to the general MTPP, our attention mechanism can also be easily applied to specific temporal structures, such as time series forecasting.