🤖 AI Summary
Longitudinal data classification (LDC) faces persistent challenges including missing values, temporal dependencies, class imbalance, and heterogeneous data types. Existing GAN-based imputation methods often neglect statistical assumptions and missingness mechanisms, resulting in limited generalizability. To address this, we propose TimeGAN-Impute, a novel GAN framework specifically designed for LDC. It explicitly models temporal dynamics and missingness patterns while incorporating a class-aware reweighting module and a mixed-type encoding module. Extensive experiments on five real-world longitudinal datasets demonstrate that TimeGAN-Impute significantly improves imputation quality (18.3% reduction in MAE) and downstream classification performance (12.7% increase in F1-score), especially under MNAR missingness and for minority classes. Our work systematically identifies three critical limitations of current GAN-based imputation approaches: insufficient statistical interpretability, poor compatibility with heterogeneous data, and weak task alignment. TimeGAN-Impute establishes a new paradigm for trustworthy longitudinal data modeling.
📝 Abstract
Longitudinal data is commonly utilised across various domains, such as health, biomedical, education and survey studies. This ubiquity has led to a rise in statistical, machine and deep learning-based methods for Longitudinal Data Classification (LDC). However, the intricate nature of the data, characterised by its multi-dimensionality, causes instance-level heterogeneity and temporal correlations that add to the complexity of longitudinal data analysis. Additionally, LDC accuracy is often hampered by the pervasiveness of missing values in longitudinal data. Despite ongoing research that draw on the generative power and utility of Generative Adversarial Networks (GANs) to address the missing data problem, critical considerations include statistical assumptions surrounding longitudinal data and missingness within it, as well as other data-level challenges like class imbalance and mixed data types that impact longitudinal data imputation (LDI) and the subsequent LDC process in GANs. This paper provides a comprehensive overview of how GANs have been applied in LDI, with a focus whether GANS have adequately addressed fundamental assumptions about the data from a LDC perspective. We propose a categorisation of main approaches to GAN-based LDI, highlight strengths and limitations of methods, identify key research trends, and provide promising future directions. Our findings indicate that while GANs show great potential for LDI to improve usability and quality of longitudinal data for tasks like LDC, there is need for more versatile approaches that can handle the wider spectrum of challenges presented by longitudinal data with missing values. By synthesising current knowledge and identifying critical research gaps, this survey aims to guide future research efforts in developing more effective GAN-based solutions to address LDC challenges.