From Word Sequences to Behavioral Sequences: Adapting Modeling and Evaluation Paradigms for Longitudinal NLP

📅 2026-01-12

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Traditional NLP treats documents as independent, unordered samples, neglecting the sequential nature of temporally ordered texts generated by the same author in longitudinal studies, thereby compromising ecological validity. This work proposes a novel paradigm for longitudinal NLP by systematically establishing an individual-centered, time-ordered behavioral sequence modeling and evaluation framework. It distinguishes between cross-sectional and prospective splits to separate across-person from across-time generalization, introduces evaluation metrics that disentangle within- and between-individual variability, adopts input structures that inherently incorporate historical context, and explores pooling-based summaries, explicit dynamic modeling, and interactive architectures. Validation on a dataset comprising 238 participants, 17,000 diary entries, and PTSD assessments demonstrates that conventional document-level evaluation can yield misleading conclusions, underscoring the necessity and efficacy of the proposed paradigm.

Technology Category

Application Category

📝 Abstract

While NLP typically treats documents as independent and unordered samples, in longitudinal studies, this assumption rarely holds: documents are nested within authors and ordered in time, forming person-indexed, time-ordered $\textit{behavioral sequences}$. Here, we demonstrate the need for and propose a longitudinal modeling and evaluation paradigm that consequently updates four parts of the NLP pipeline: (1) evaluation splits aligned to generalization over people ($\textit{cross-sectional}$) and/or time ($\textit{prospective}$); (2) accuracy metrics separating between-person differences from within-person dynamics; (3) sequence inputs to incorporate history by default; and (4) model internals that support different $\textit{coarseness}$ of latent state over histories (pooled summaries, explicit dynamics, or interaction-based models). We demonstrate the issues ensued by traditional pipeline and our proposed improvements on a dataset of 17k daily diary transcripts paired with PTSD symptom severity from 238 participants, finding that traditional document-level evaluation can yield substantially different and sometimes reversed conclusions compared to our ecologically valid modeling and evaluation. We tie our results to a broader discussion motivating a shift from word-sequence evaluation toward $\textit{behavior-sequence}$ paradigms for NLP.

Problem

Research questions and friction points this paper is trying to address.

longitudinal NLP

behavioral sequences

cross-sectional evaluation

prospective evaluation

within-person dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

longitudinal NLP

behavioral sequences

prospective evaluation