🤖 AI Summary
This work addresses the limitation of generic graph augmentation strategies in standard graph contrastive learning, which often disrupt the semantic structure of mathematical expression graphs—particularly for small-scale, structurally compact formulas. To mitigate this issue, the authors propose a domain-specific augmentation technique tailored for mathematical information retrieval: Variable Substitution. This approach preserves the core algebraic structure and semantic meaning of formulas within a graph contrastive learning framework. Notably, it introduces, for the first time, a structure-preserving variable substitution mechanism into graph contrastive learning, effectively alleviating semantic distortion. Experimental results demonstrate that the proposed method significantly outperforms conventional augmentation strategies when integrated into established graph contrastive retrieval models, yielding substantial improvements in mathematical formula retrieval performance.
📝 Abstract
This paper introduces Variable Substitution as a domain-specific graph augmentation technique for graph contrastive learning (GCL) in the context of searching for mathematical formulas. Standard GCL augmentation techniques often distort the semantic meaning of mathematical formulas, particularly for small and highly structured graphs. Variable Substitution, on the other hand, preserves the core algebraic relationships and formula structure. To demonstrate the effectiveness of our technique, we apply it to a classic GCL-based retrieval model. Experiments show that this straightforward approach significantly improves retrieval performance compared to generic augmentation strategies. We release the code on GitHub.\footnote{https://github.com/lazywulf/formula_ret_aug}.