🤖 AI Summary
Graph Neural Networks (GNNs) often employ fully connected feature graphs to model feature interactions, yet such dense structures introduce spurious non-interacting edges that degrade both predictive performance and interpretability. Method: This paper addresses the structural selection problem in feature interaction modeling by proposing a sparse feature graph construction method grounded in the Minimum Description Length (MDL) principle. It provides the first information-theoretic justification for sparsifying feature graphs, proving that retaining only pairwise interacting edges adheres to Occam’s Razor—yielding maximally compact yet sufficient representations. Contribution/Results: Systematic evaluation on synthetic data validates the critical role of interacting edges and the detrimental impact of non-interacting ones. Experiments demonstrate that the MDL-guided sparse feature graphs significantly improve GNN accuracy and interpretability on feature interaction tasks, while ensuring computational efficiency and theoretical rigor.
📝 Abstract
Feature interaction is crucial in predictive machine learning models, as it captures the relationships between features that influence model performance. In this work, we focus on pairwise interactions and investigate their importance in constructing feature graphs for Graph Neural Networks (GNNs). Rather than proposing new methods, we leverage existing GNN models and tools to explore the relationship between feature graph structures and their effectiveness in modeling interactions. Through experiments on synthesized datasets, we uncover that edges between interacting features are important for enabling GNNs to model feature interactions effectively. We also observe that including non-interaction edges can act as noise, degrading model performance. Furthermore, we provide theoretical support for sparse feature graph selection using the Minimum Description Length (MDL) principle. We prove that feature graphs retaining only necessary interaction edges yield a more efficient and interpretable representation than complete graphs, aligning with Occam's Razor. Our findings offer both theoretical insights and practical guidelines for designing feature graphs that improve the performance and interpretability of GNN models.