🤖 AI Summary
Existing graph dataset compression methods exhibit limited generalization across tasks and domains, often failing to preserve original model performance. This work proposes TGCC, a transferable graph compression framework that introduces causal invariance into graph compression for the first time. TGCC extracts causally invariant features in the spatial domain through causal intervention, integrates enhanced compression operations, and leverages spectral-domain contrastive learning to generate compact datasets that retain the original graph’s causal structure. Extensive experiments on five public benchmarks and a newly constructed FinReport dataset demonstrate that TGCC improves performance by up to 13.41% in cross-task and cross-domain settings and achieves state-of-the-art results in five out of six single-task scenarios.
📝 Abstract
The increasing scale of graph datasets has significantly improved the performance of graph representation learning methods, but it has also introduced substantial training challenges. Graph dataset condensation techniques have emerged to compress large datasets into smaller yet information-rich datasets, while maintaining similar test performance. However, these methods strictly require downstream applications to match the original dataset and task, which often fails in cross-task and cross-domain scenarios. To address these challenges, we propose a novel causal-invariance-based and transferable graph dataset condensation method, named \textbf{TGCC}, providing effective and transferable condensed datasets. Specifically, to preserve domain-invariant knowledge, we first extract domain causal-invariant features from the spatial domain of the graph using causal interventions. Then, to fully capture the structural and feature information of the original graph, we perform enhanced condensation operations. Finally, through spectral-domain enhanced contrastive learning, we inject the causal-invariant features into the condensed graph, ensuring that the compressed graph retains the causal information of the original graph. Experimental results on five public datasets and our novel \textbf{FinReport} dataset demonstrate that TGCC achieves up to a 13.41\% improvement in cross-task and cross-domain complex scenarios compared to existing methods, and achieves state-of-the-art performance on 5 out of 6 datasets in the single dataset and task scenario.