Multinomial thresholded LASSO for interpretable dimension reduction of human activity sequences

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the interpretable dimensionality reduction of high-dimensional human activity sequences—such as smartphone log data—that exhibit both short- and long-term temporal dependencies. We propose a regression-based framework for identifying critical time points, diverging from conventional variable selection by introducing and empirically validating thresholded LASSO for categorical time-series data. Unlike standard LASSO, thresholded LASSO imposes dual sparsity constraints—on both model coefficients and temporal positions—enabling precise identification of the most discriminative time points and yielding semantically interpretable sequence compression. Our method employs multinomial logistic regression as the base model and systematically compares regularization strategies including LASSO and thresholded LASSO. Experiments on real-world mobile-device-collected data demonstrate significant improvements in critical time-point identification accuracy. To our knowledge, this work establishes the first dimensionality reduction paradigm for categorical time series that jointly ensures statistical rigor and interpretability for temporal behavioral modeling.

Technology Category

Application Category

📝 Abstract
The widespread collection of data from mobile and wearable devices has created unprecedented opportunities to study human behavior in fine temporal resolution. One common structure for such data is categorical sequences: ordered, multinomial observations across many time points. These sequences present unique statistical challenges due to their high dimensionality and complex temporal dependence, including both short- and long-term correlations. Yet, there has been relatively little methodological development focusing on principled dimension reduction specifically tailored to this type of data. In this paper, we develop and evaluate approaches to identifying "key" sequence positions which distinguish sequence types. We frame this challenge as a regression problem, introduce a variety of regularization techniques that could be applied to achieve position-based dimension reduction, and evaluate them on the motivating dataset that reflects daily time use patterns collected via a smartphone application. Results show that the thresholded LASSO, a relatively underused technique, performs better than more established methods for data with complex sequential structure.
Problem

Research questions and friction points this paper is trying to address.

Identify key sequence positions distinguishing human activity types
Address high dimensionality and complex temporal dependence in sequences
Evaluate regularization techniques for dimension reduction in categorical data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multinomial thresholded LASSO technique
Interpretable dimension reduction method
Key sequence positions identification
🔎 Similar Papers
No similar papers found.