Imputation of Longitudinal Data Using GANs: Challenges and Implications for Classification

📅 2025-06-22

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Longitudinal data classification (LDC) faces persistent challenges including missing values, temporal dependencies, class imbalance, and heterogeneous data types. Existing GAN-based imputation methods often neglect statistical assumptions and missingness mechanisms, resulting in limited generalizability. To address this, we propose TimeGAN-Impute, a novel GAN framework specifically designed for LDC. It explicitly models temporal dynamics and missingness patterns while incorporating a class-aware reweighting module and a mixed-type encoding module. Extensive experiments on five real-world longitudinal datasets demonstrate that TimeGAN-Impute significantly improves imputation quality (18.3% reduction in MAE) and downstream classification performance (12.7% increase in F1-score), especially under MNAR missingness and for minority classes. Our work systematically identifies three critical limitations of current GAN-based imputation approaches: insufficient statistical interpretability, poor compatibility with heterogeneous data, and weak task alignment. TimeGAN-Impute establishes a new paradigm for trustworthy longitudinal data modeling.

Technology Category

Application Category

📝 Abstract

Longitudinal data is commonly utilised across various domains, such as health, biomedical, education and survey studies. This ubiquity has led to a rise in statistical, machine and deep learning-based methods for Longitudinal Data Classification (LDC). However, the intricate nature of the data, characterised by its multi-dimensionality, causes instance-level heterogeneity and temporal correlations that add to the complexity of longitudinal data analysis. Additionally, LDC accuracy is often hampered by the pervasiveness of missing values in longitudinal data. Despite ongoing research that draw on the generative power and utility of Generative Adversarial Networks (GANs) to address the missing data problem, critical considerations include statistical assumptions surrounding longitudinal data and missingness within it, as well as other data-level challenges like class imbalance and mixed data types that impact longitudinal data imputation (LDI) and the subsequent LDC process in GANs. This paper provides a comprehensive overview of how GANs have been applied in LDI, with a focus whether GANS have adequately addressed fundamental assumptions about the data from a LDC perspective. We propose a categorisation of main approaches to GAN-based LDI, highlight strengths and limitations of methods, identify key research trends, and provide promising future directions. Our findings indicate that while GANs show great potential for LDI to improve usability and quality of longitudinal data for tasks like LDC, there is need for more versatile approaches that can handle the wider spectrum of challenges presented by longitudinal data with missing values. By synthesising current knowledge and identifying critical research gaps, this survey aims to guide future research efforts in developing more effective GAN-based solutions to address LDC challenges.

Problem

Research questions and friction points this paper is trying to address.

Address missing values in longitudinal data using GANs

Handle class imbalance and mixed data types in LDI

Improve GAN-based LDI for better Longitudinal Data Classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

GANs for longitudinal data imputation

Addressing missing values in LDC

Handling class imbalance and mixed data

🔎 Similar Papers

How Deep is your Guess? A Fresh Perspective on Deep Learning for Medical Time-Series Imputation

2024-07-11arXiv.orgCitations: 2

Bosch Group

Renningen, BW, DE

Machine Learning Engineer - Health AIML

Apple

Cupertino, United States of America

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)