Bridging Sequence and Graph Structure for Epigenetic Age Prediction

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

213K/year
🤖 AI Summary
Existing methods struggle to jointly model the co-methylation graph structure of DNA methylation sites and their sequence context within a unified framework. This work proposes a sequence–graph fusion architecture that integrates eight handcrafted DNA sequence-derived statistical features—such as CpG density—via a lightweight gated modulation mechanism to adaptively scale methylation signals before feeding them into a graph convolutional network for epigenetic age prediction. By simultaneously capturing both sequence context and graph topology, the approach achieves a mean absolute error of 3.149 years on 3,707 blood samples, outperforming the strongest graph-based baseline by 12.8%. Furthermore, it reveals the dynamic importance of key sequence features during aging, demonstrating superior performance over end-to-end CNN-based representations.
📝 Abstract
Epigenetic clocks based on DNA methylation have emerged as powerful tools for estimating biological age, with broad applications in aging research, age-related disease studies, and longevity science. Despite advances across machine learning approaches to epigenetic age prediction, spanning penalised linear regression, deep feedforward networks, residual architectures, and graph neural networks, no existing method jointly models co-methylation graph structure and site-specific DNA sequence context within a unified framework. We propose a unified sequence--graph integration framework for epigenetic age prediction that addresses this gap, integrating eight-dimensional DNA sequence statistical features through a lightweight gated modulation mechanism that adaptively scales each site's methylation signal according to its sequence-determined biological relevance prior to graph convolution. Evaluated on 3,707 blood methylation samples against a comprehensive set of baselines, our method achieves a test MAE of 3.149 years, a 12.8\% improvement over the strongest graph-based baseline. Biologically informed statistical features outperform CNN-based sequence encoding, demonstrating that handcrafted sequence features are more effective than end-to-end learned representations in this data regime. Post-hoc interpretability analysis identifies CpG density and local adenine frequency as features with age-dependent importance shifts, consistent with known mechanisms of age-related hypermethylation at CpG-dense promoter regions. Our code is at https://github.com/yaoli2022/graphage-seq.
Problem

Research questions and friction points this paper is trying to address.

epigenetic age prediction
DNA methylation
graph structure
sequence context
co-methylation
Innovation

Methods, ideas, or system contributions that make the work stand out.

sequence-graph integration
epigenetic age prediction
gated modulation
DNA methylation
graph neural networks
💼 Related Jobs
Y
Yao Li
School of Computing and Information Systems, The University of Melbourne
X
Xikun Zhang
School of Computing Technologies, RMIT University
X
Xiaotao Shen
Lee Kong Chian School of Medicine, Nanyang Technological University Singapore
Sonika Tyagi
Sonika Tyagi
AI & Data Science Division, School of Computing Technologies, RMIT University
Artificial IntelligenceData ScienceComputational BiologyDigital HealthRNA
Xin Zheng
Xin Zheng
Assistant Professor, School of ICT, Griffith University
Data-centric Graph MLAutomated Graph MLOps
J
Jiaxing Huang
Department of Data Science and Artificial Intelligence, Hong Kong Polytechnic University
Feng Xia
Feng Xia
Professor, School of Computing Technologies, RMIT University
Artificial IntelligenceGraph LearningBrainRoboticsCyber-Physical Systems