Impact of Data Breadth and Depth on Performance of Siamese Neural Network Model: Experiments with Three Keystroke Dynamic Datasets

📅 2025-01-10

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study investigates the impact of data breadth (number of subjects) versus depth (per-subject sample size and keystroke sequence length) on Siamese network performance in keystroke dynamics authentication. Using three public datasets—Aalto, CMU, and Clarkson II—we employ triplet-loss training, feature-space density analysis, and comprehensive ablation studies across multiple dimensions. We quantitatively demonstrate that increasing subject count markedly improves cross-user generalization, whereas the effect of per-subject data depth is text-type dependent: free-text authentication performance is jointly constrained by both sample size and sequence length, whereas fixed-text scenarios exhibit greater robustness. Crucially, we establish that expanding the number of subjects yields greater accuracy gains than increasing per-subject data volume. Our core contribution is a data configuration optimization principle for behavioral biometric modeling, grounded in empirical evidence of the trade-off between dataset scale and authentication accuracy—providing practitioners with reproducible, deployment-oriented guidance for balancing resource constraints against system performance.

Technology Category

Application Category

📝 Abstract

Deep learning models, such as the Siamese Neural Networks (SNN), have shown great potential in capturing the intricate patterns in behavioral data. However, the impacts of dataset breadth (i.e., the number of subjects) and depth (e.g., the amount of training samples per subject) on the performance of these models is often informally assumed, and remains under-explored. To this end, we have conducted extensive experiments using the concepts of"feature space"and"density"to guide and gain deeper understanding on the impact of dataset breadth and depth on three publicly available keystroke datasets (Aalto, CMU and Clarkson II). Through varying the number of training subjects, number of samples per subject, amount of data in each sample, and number of triplets used in training, we found that when feasible, increasing dataset breadth enables the training of a well-trained model that effectively captures more inter-subject variability. In contrast, we find that the extent of depth's impact from a dataset depends on the nature of the dataset. Free-text datasets are influenced by all three depth-wise factors; inadequate samples per subject, sequence length, training triplets and gallery sample size, which may all lead to an under-trained model. Fixed-text datasets are less affected by these factors, and as such make it easier to create a well-trained model. These findings shed light on the importance of dataset breadth and depth in training deep learning models for behavioral biometrics and provide valuable insights for designing more effective authentication systems.

Problem

Research questions and friction points this paper is trying to address.

Siamese Neural Network

Typing Rhythm Recognition

Data Set Characteristics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Siamese Neural Networks

Typing Rhythm Dataset

Behavioral Biometrics

🔎 Similar Papers

No similar papers found.