Heterogeneous Directed Hypergraph Neural Network over abstract syntax tree (AST) for Code Classification

📅 2023-05-07
🏛️ International Conference on Software Engineering and Knowledge Engineering
📈 Citations: 2
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
Code classification faces challenges in adequately modeling Abstract Syntax Tree (AST) structures: existing AST-based Graph Neural Network (GNN) approaches capture only pairwise node relationships, neglecting higher-order semantic dependencies among nodes within the same field or function call; meanwhile, generic hypergraphs lack critical structural information—such as node types, edge directionality, and parent-child hierarchical relations. To address this, we propose, for the first time, modeling ASTs as Heterogeneous Directed Hypergraphs (HDHGs), explicitly encoding node types, directed parent-child relationships, and higher-order structural dependencies. Based on this representation, we design the Heterogeneous Directed Hypergraph Neural Network (HDHGN), which integrates type-aware message passing with directed hypergraph convolution. Experiments on public Python and Java code classification benchmarks demonstrate that HDHGN significantly outperforms state-of-the-art AST-GNN baselines, validating the effectiveness of jointly modeling higher-order, typed, and direction-aware structural semantics for code understanding.
📝 Abstract
Code classification is a difficult issue in program understanding and automatic coding. Due to the elusive syntax and complicated semantics in programs, most existing studies use techniques based on abstract syntax tree (AST) and graph neural network (GNN) to create code representations for code classification. These techniques utilize the structure and semantic information of the code, but they only take into account pairwise associations and neglect the high-order correlations that already exist between nodes in the AST, which may result in the loss of code structural information. On the other hand, while a general hypergraph can encode high-order data correlations, it is homogeneous and undirected which will result in a lack of semantic and structural information such as node types, edge types, and directions between child nodes and parent nodes when modeling AST. In this study, we propose to represent AST as a heterogeneous directed hypergraph (HDHG) and process the graph by heterogeneous directed hypergraph neural network (HDHGN) for code classification. Our method improves code understanding and can represent high-order data correlations beyond paired interactions. We assess heterogeneous directed hypergraph neural network (HDHGN) on public datasets of Python and Java programs. Our method outperforms previous AST-based and GNN-based methods, which demonstrates the capability of our model.
Problem

Research questions and friction points this paper is trying to address.

Modeling high-order correlations in AST for code classification
Addressing limitations of homogeneous undirected hypergraphs in AST representation
Capturing semantic and structural information loss in existing GNN methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous directed hypergraph represents AST
Neural network processes hypergraph for classification
Captures high-order correlations beyond pairwise interactions
🔎 Similar Papers