🤖 AI Summary
Existing graph generation models suffer from limitations in structural representation learning and attribute prediction. To address this, we propose G2PT: an autoregressive generative framework based on serialized graph representations—namely, concatenated node and edge sets. G2PT pioneers compact encoding of graph structures into tokenized sequences and is the first to systematically adapt the pre-trained Transformer paradigm to general-purpose graph generation, enabling cross-domain transfer (e.g., between molecular and social networks). It learns graph structure via next-token prediction and introduces a two-stage fine-tuning strategy: one for goal-directed generation and another for graph property prediction. On benchmark general graph and molecular datasets, G2PT achieves state-of-the-art generation quality. In downstream tasks, it demonstrates strong generalization—substantially outperforming baselines in both molecular design and graph property prediction.
📝 Abstract
Graph generation is a critical task in numerous domains, including molecular design and social network analysis, due to its ability to model complex relationships and structured data. While most modern graph generative models utilize adjacency matrix representations, this work revisits an alternative approach that represents graphs as sequences of node set and edge set. We advocate for this approach due to its efficient encoding of graphs and propose a novel representation. Based on this representation, we introduce the Graph Generative Pre-trained Transformer (G2PT), an auto-regressive model that learns graph structures via next-token prediction. To further exploit G2PT's capabilities as a general-purpose foundation model, we explore fine-tuning strategies for two downstream applications: goal-oriented generation and graph property prediction. We conduct extensive experiments across multiple datasets. Results indicate that G2PT achieves superior generative performance on both generic graph and molecule datasets. Furthermore, G2PT exhibits strong adaptability and versatility in downstream tasks from molecular design to property prediction.