StructCoh: Structured Contrastive Learning for Context-Aware Text Semantic Matching

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Semantic matching of text requires jointly modeling hierarchical syntactic structure and fine-grained semantic distinctions, yet prevailing pretrained language models struggle to capture structured, cross-sentence interactions. To address this, we propose a context-aware dual-graph encoding framework: (1) constructing bilingual semantic graphs by integrating dependency parsing and topic modeling; (2) propagating structural features via Graph Isomorphism Networks (GIN); and (3) introducing a joint node-level and graph-level contrastive learning objective, enhanced by explicit and implicit negative sampling to refine the representation space. This work is the first to synergistically integrate structure-aware graph encoding with hierarchical contrastive learning for semantic matching. Evaluated on three legal document matching benchmarks and an academic plagiarism detection dataset, our method achieves state-of-the-art performance—e.g., 86.7% F1 on legal provision matching, representing an absolute improvement of 6.2%.

Technology Category

Application Category

📝 Abstract

Text semantic matching requires nuanced understanding of both structural relationships and fine-grained semantic distinctions. While pre-trained language models excel at capturing token-level interactions, they often overlook hierarchical structural patterns and struggle with subtle semantic discrimination. In this paper, we proposed StructCoh, a graph-enhanced contrastive learning framework that synergistically combines structural reasoning with representation space optimization. Our approach features two key innovations: (1) A dual-graph encoder constructs semantic graphs via dependency parsing and topic modeling, then employs graph isomorphism networks to propagate structural features across syntactic dependencies and cross-document concept nodes. (2) A hierarchical contrastive objective enforces consistency at multiple granularities: node-level contrastive regularization preserves core semantic units, while graph-aware contrastive learning aligns inter-document structural semantics through both explicit and implicit negative sampling strategies. Experiments on three legal document matching benchmarks and academic plagiarism detection datasets demonstrate significant improvements over state-of-the-art methods. Notably, StructCoh achieves 86.7% F1-score (+6.2% absolute gain) on legal statute matching by effectively identifying argument structure similarities.

Problem

Research questions and friction points this paper is trying to address.

Enhancing text semantic matching with structural relationships

Addressing hierarchical pattern oversight in language models

Improving fine-grained semantic discrimination through graph contrastive learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-graph encoder with dependency parsing and topic modeling

Graph isomorphism networks propagate structural features

Hierarchical contrastive objective with multi-granularity consistency

🔎 Similar Papers

Noise Contrastive Estimation-based Matching Framework for Low-Resource Security Attack Pattern Recognition

2024-01-18FindingsCitations: 5

Authors to Follow