🤖 AI Summary
Entity classification in relational databases is often hindered by class imbalance, which compromises the performance on minority classes. This work presents the first systematic study of this issue and proposes a relation-centric oversampling framework based on graph neural networks. The approach models relational databases as heterogeneous graphs, introduces a relation-aware gating mechanism to adaptively aggregate neighbor information, and designs a relation-guided synthetic strategy for minority classes that preserves relational consistency in generated samples. Extensive experiments on 12 entity classification datasets demonstrate that the proposed method significantly outperforms existing techniques, achieving average improvements of 2.46% in balanced accuracy and 4.00% in G-Mean, thereby effectively enhancing the representation and classification capability for minority classes.
📝 Abstract
In recent advances, to enable a fully data-driven learning paradigm on relational databases (RDB), relational deep learning (RDL) is proposed to structure the RDB as a heterogeneous entity graph and adopt the graph neural network (GNN) as the predictive model. However, existing RDL methods neglect the imbalance problem of relational data in RDBs and risk under-representing the minority entities, leading to an unusable model in practice. In this work, we investigate, for the first time, class imbalance problem in RDB entity classification and design the relation-centric minority synthetic over-sampling GNN (Rel-MOSS), in order to fill a critical void in the current literature. Specifically, to mitigate the issue of minority-related information being submerged by majority counterparts, we design the relation-wise gating controller to modulate neighborhood messages from each individual relation type. Based on the relational-gated representations, we further propose the relation-guided minority synthesizer for over-sampling, which integrates the entity relational signatures to maintain relational consistency. Extensive experiments on 12 entity classification datasets provide compelling evidence for the superiority of Rel-MOSS, yielding an average improvement of up to 2.46% and 4.00% in terms of Balanced Accuracy and G-Mean, compared with SOTA RDL methods and classic methods for handling class imbalance.