Tokenization of Gaze Data

📅 2025-03-28

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Prior to this work, tokenization strategies for gaze data—critical for integrating eye-tracking signals into large language models (LLMs) and multimodal LLMs (MLLMs)—remained unexplored, leaving a fundamental gap in modality-specific preprocessing. Method: We systematically evaluate five tokenization approaches—quantile binning, k-means clustering, and linear/logarithmic/adaptive binning—accounting for gaze sequences’ continuity and statistical heterogeneity. Leveraging pre-trained MLLMs’ visual encoders, we discretize gaze trajectories into learnable tokens compatible with LLM architectures. Contribution/Results: Quantile binning achieves lowest error in spatial position prediction, while k-means excels in velocity prediction—demonstrating strong dependence of tokenization efficacy on gaze distribution characteristics. Experiments across three benchmark eye-tracking datasets show consistent improvements in reconstruction accuracy, compression ratio, and downstream LLM performance. This work establishes the first principled framework for gaze tokenization, enabling effective integration of biological signals into multimodal foundation models.

Technology Category

Application Category

📝 Abstract

A considerable part of the performance of today's large language models (LLM's) and multimodal large language models (MLLM's) depends on their tokenization strategies. While tokenizers are extensively researched for textual and visual input, there is no research on tokenization strategies for gaze data due to its nature. However, a corresponding tokenization strategy would allow using the vision capabilities of pre-trained MLLM's for gaze data, for example, through fine-tuning. In this paper, we aim to close this research gap by analyzing five different tokenizers for gaze data on three different datasets for the forecasting and generation of gaze data through LLMs (cf.~cref{fig:teaser}). We evaluate the tokenizers regarding their reconstruction and compression abilities. Further, we train an LLM for each tokenization strategy, measuring its generative and predictive performance. Overall, we found that a quantile tokenizer outperforms all others in predicting the gaze positions and k-means is best when predicting gaze velocities.

Problem

Research questions and friction points this paper is trying to address.

Develop gaze data tokenization for MLLM fine-tuning

Compare tokenizers for gaze forecasting and generation

Evaluate reconstruction, compression, and predictive performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing five tokenizers for gaze data

Evaluating reconstruction and compression abilities

Quantile tokenizer excels in gaze position prediction

🔎 Similar Papers

No similar papers found.