A Comparative Study of Traditional Machine Learning, Deep Learning, and Large Language Models for Mental Health Forecasting using Smartphone Sensing Data

πŸ“… 2026-01-07
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the prospective prediction of college students’ mental health status using smartphone sensing data to enable timely intervention. Leveraging a large-scale longitudinal dataset, it presents the first systematic comparison of traditional machine learning, deep learning (including Transformer architectures), and large language models (LLMs) for this task, while comprehensively evaluating the impact of varying time windows, feature granularities, personalization strategies, and class imbalance handling techniques. Results demonstrate that Transformer-based models achieve the best overall performance (Macro-F1 = 0.58), LLMs excel at contextual reasoning but exhibit limited temporal modeling capacity, and personalized modeling significantly enhances prediction accuracy for severe mental health states. This work establishes the first comprehensive benchmark for AI-driven mental health prediction and underscores the critical roles of model selection and personalized design.

Technology Category

Application Category

πŸ“ Abstract
Smartphone sensing offers an unobtrusive and scalable way to track daily behaviors linked to mental health, capturing changes in sleep, mobility, and phone use that often precede symptoms of stress, anxiety, or depression. While most prior studies focus on detection that responds to existing conditions, forecasting mental health enables proactive support through Just-in-Time Adaptive Interventions. In this paper, we present the first comprehensive benchmarking study comparing traditional machine learning (ML), deep learning (DL), and large language model (LLM) approaches for mental health forecasting using the College Experience Sensing (CES) dataset, the most extensive longitudinal dataset of college student mental health to date. We systematically evaluate models across temporal windows, feature granularities, personalization strategies, and class imbalance handling. Our results show that DL models, particularly Transformer (Macro-F1 = 0.58), achieve the best overall performance, while LLMs show strength in contextual reasoning but weaker temporal modeling. Personalization substantially improves forecasts of severe mental health states. By revealing how different modeling approaches interpret phone sensing behavioral data over time, this work lays the groundwork for next-generation, adaptive, and human-centered mental health technologies that can advance both research and real-world well-being.
Problem

Research questions and friction points this paper is trying to address.

mental health forecasting
smartphone sensing
just-in-time adaptive interventions
behavioral data
proactive support
Innovation

Methods, ideas, or system contributions that make the work stand out.

mental health forecasting
smartphone sensing
large language models
personalized modeling
Transformer
πŸ”Ž Similar Papers
No similar papers found.
K
Kaidong Feng
Singapore University of Technology and Design, Singapore
Z
Zhu Sun
Nanyang Technological University, Singapore
Roy Ka-Wei Lee
Roy Ka-Wei Lee
Singapore University of Technology and Design
Trust and SafetySocial ComputingComputational Social ScienceNatural Language Processing
X
Xun Jiang
Tianqiao and Chrissy Chen Institute, Singapore and Theta Health Inc, Singapore
Y
Y. Theng
Nanyang Technological University, Singapore
Yi Ding
Yi Ding
Purdue University
AI/ML SystemsSustainabilityHealthcare