Transformer-Based Modeling of User Interaction Sequences for Dwell Time Prediction in Human-Computer Interfaces

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of accurately predicting dwell time on interface elements in human-computer interaction. We propose the first end-to-end Transformer-based modeling approach for this task. Our method uniformly embeds multimodal interaction sequences—including clicks, scrolls, dwell durations, and contextual features—and integrates positional encoding with multi-head self-attention to effectively capture long-range dependencies and dynamic behavioral couplings. Sensitivity analysis demonstrates strong robustness and generalization across varying numbers of attention heads, window lengths, and cross-device scenarios. Extensive experiments show that our model consistently outperforms BiLSTM, DRFormer, FedFormer, and iTransformer across four key metrics: MSE, RMSE, MAPE, and RMAE. It significantly enhances modeling accuracy for complex, heterogeneous interaction patterns, thereby providing a reliable foundation for adaptive UI design and accessibility optimization.

Technology Category

Application Category

📝 Abstract
This study investigates the task of dwell time prediction and proposes a Transformer framework based on interaction behavior modeling. The method first represents user interaction sequences on the interface by integrating dwell duration, click frequency, scrolling behavior, and contextual features, which are mapped into a unified latent space through embedding and positional encoding. On this basis, a multi-head self-attention mechanism is employed to capture long-range dependencies, while a feed-forward network performs deep nonlinear transformations to model the dynamic patterns of dwell time. Multiple comparative experiments are conducted with BILSTM, DRFormer, FedFormer, and iTransformer as baselines under the same conditions. The results show that the proposed method achieves the best performance in terms of MSE, RMSE, MAPE, and RMAE, and more accurately captures the complex patterns in interaction behavior. In addition, sensitivity experiments are carried out on hyperparameters and environments to examine the impact of the number of attention heads, sequence window length, and device environment on prediction performance, which further demonstrates the robustness and adaptability of the method. Overall, this study provides a new solution for dwell time prediction from both theoretical and methodological perspectives and verifies its effectiveness in multiple aspects.
Problem

Research questions and friction points this paper is trying to address.

Predicts dwell time using Transformer and user interaction sequences.
Models dynamic patterns with self-attention and feed-forward networks.
Outperforms baselines in accuracy and robustness across metrics.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer framework models user interaction sequences
Multi-head self-attention captures long-range dependencies
Embedding integrates dwell, click, scroll, and context features
🔎 Similar Papers
No similar papers found.
R
Rui Liu
University of Melbourne, Melbourne, Australia
R
Runsheng Zhang
University of Southern California, Los Angeles, USA
Shixiao Wang
Shixiao Wang
Senior Lecturer of Department of Mathematics, Auckland University
Fluid mechanicsNonlinear partial differential equations