🤖 AI Summary
This study addresses the challenge of modeling disease progression for spatially clustered multi-state conditions—such as periodontitis—using only a single, cross-sectional snapshot of binary or ordinal clinical assessments. We propose a Bayesian semiparametric accelerated failure time model: (i) a monotonic single-index structure ensures clinical interpretability; (ii) an unknown link function is estimated via basis expansion with a constrained Gaussian process prior; (iii) spatial random effects are modeled using an inverse-Wishart prior to ensure identifiability; and (iv) inference leverages a Dirichlet process mixture of Gaussians for error distribution, combined with elliptical slice sampling and hard-constrained smoothing for computational efficiency. The method demonstrates accurate and robust estimation in simulation studies. Applied to real periodontal data, it successfully quantifies inter-tooth spatial dependence and state occupancy probabilities. Our framework establishes a novel, interpretable, and scalable paradigm for spatial multi-state disease modeling under sparse spatiotemporal observation.
📝 Abstract
Assessment of multistate disease progression is commonplace in biomedical research, such as, in periodontal disease (PD). However, the presence of multistate current status endpoints, where only a single snapshot of each subject's progression through disease states is available at a random inspection time after a known starting state, complicates the inferential framework. In addition, these endpoints can be clustered, and spatially associated, where a group of proximally located teeth (within subjects) may experience similar PD status, compared to those distally located. Motivated by a clinical study recording PD progression, we propose a Bayesian semiparametric accelerated failure time model with an inverse-Wishart proposal for accommodating (spatial) random effects, and flexible errors that follow a Dirichlet process mixture of Gaussians. For clinical interpretability, the systematic component of the event times is modeled using a monotone single index model, with the (unknown) link function estimated via a novel integrated basis expansion and basis coefficients endowed with constrained Gaussian process priors. In addition to establishing parameter identifiability, we present scalable computing via a combination of elliptical slice sampling, fast circulant embedding techniques, and smoothing of hard constraints, leading to straightforward estimation of parameters, and state occupation and transition probabilities. Using synthetic data, we study the finite sample properties of our Bayesian estimates, and their performance under model misspecification. We also illustrate our method via application to the real clinical PD dataset.