StructCoh: Structured Contrastive Learning for Context-Aware Text Semantic Matching

๐Ÿ“… 2025-09-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Semantic matching of text requires jointly modeling hierarchical syntactic structure and fine-grained semantic distinctions, yet prevailing pretrained language models struggle to capture structured, cross-sentence interactions. To address this, we propose a context-aware dual-graph encoding framework: (1) constructing bilingual semantic graphs by integrating dependency parsing and topic modeling; (2) propagating structural features via Graph Isomorphism Networks (GIN); and (3) introducing a joint node-level and graph-level contrastive learning objective, enhanced by explicit and implicit negative sampling to refine the representation space. This work is the first to synergistically integrate structure-aware graph encoding with hierarchical contrastive learning for semantic matching. Evaluated on three legal document matching benchmarks and an academic plagiarism detection dataset, our method achieves state-of-the-art performanceโ€”e.g., 86.7% F1 on legal provision matching, representing an absolute improvement of 6.2%.

Technology Category

Application Category

๐Ÿ“ Abstract
Text semantic matching requires nuanced understanding of both structural relationships and fine-grained semantic distinctions. While pre-trained language models excel at capturing token-level interactions, they often overlook hierarchical structural patterns and struggle with subtle semantic discrimination. In this paper, we proposed StructCoh, a graph-enhanced contrastive learning framework that synergistically combines structural reasoning with representation space optimization. Our approach features two key innovations: (1) A dual-graph encoder constructs semantic graphs via dependency parsing and topic modeling, then employs graph isomorphism networks to propagate structural features across syntactic dependencies and cross-document concept nodes. (2) A hierarchical contrastive objective enforces consistency at multiple granularities: node-level contrastive regularization preserves core semantic units, while graph-aware contrastive learning aligns inter-document structural semantics through both explicit and implicit negative sampling strategies. Experiments on three legal document matching benchmarks and academic plagiarism detection datasets demonstrate significant improvements over state-of-the-art methods. Notably, StructCoh achieves 86.7% F1-score (+6.2% absolute gain) on legal statute matching by effectively identifying argument structure similarities.
Problem

Research questions and friction points this paper is trying to address.

Enhancing text semantic matching with structural relationships
Addressing hierarchical pattern oversight in language models
Improving fine-grained semantic discrimination through graph contrastive learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-graph encoder with dependency parsing and topic modeling
Graph isomorphism networks propagate structural features
Hierarchical contrastive objective with multi-granularity consistency
Chao Xue
Chao Xue
Beihang University
Natural Language ProcessingLarge Language Model
Z
Ziyuan Gao
University College London, London, UK