Research on feature fusion and multimodal patent text based on graph attention network

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Addressing three key challenges in patent text semantic mining—difficult cross-modal feature fusion, low efficiency in long-text modeling, and hierarchical semantic discontinuity—this paper proposes HGM-Net. The framework introduces Hierarchical Contrastive Learning (HCL) to enforce semantic consistency across word–sentence–paragraph granularities; a Multimodal Graph Attention Network (M-GAT) to jointly model heterogeneous relationships among patent classification codes, citation links, and textual content; and Multi-granularity Sparse Attention (MSA) to enhance computational efficiency for long documents. These components synergistically enable dynamic cross-modal fusion and hierarchical semantic alignment. Evaluated on patent classification and similarity matching tasks, HGM-Net significantly outperforms existing deep learning methods, improving examination efficiency by 23.6% and boosting F1-score for technical association identification by 18.4%.

Technology Category

Application Category

📝 Abstract

Aiming at the problems of cross-modal feature fusion, low efficiency of long text modeling and lack of hierarchical semantic coherence in patent text semantic mining, this study proposes HGM-Net, a deep learning framework that integrates Hierarchical Comparative Learning (HCL), Multi-modal Graph Attention Network (M-GAT) and Multi-Granularity Sparse Attention (MSA), which builds a dynamic mask, contrast and cross-structural similarity constraints on the word, sentence and paragraph hierarchies through HCL. Contrast and cross-structural similarity constraints are constructed at the word and paragraph levels by HCL to strengthen the local semantic and global thematic consistency of patent text; M-GAT models patent classification codes, citation relations and text semantics as heterogeneous graph structures, and achieves dynamic fusion of multi-source features by cross-modal gated attention; MSA adopts a hierarchical sparsity strategy to optimize the computational efficiency of long text modeling at word, phrase, sentence and paragraph granularity. Experiments show that the framework demonstrates significant advantages over existing deep learning methods in tasks such as patent classification and similarity matching, and provides a solution with both theoretical innovation and practical value for solving the problems of patent examination efficiency improvement and technology relevance mining.

Problem

Research questions and friction points this paper is trying to address.

Cross-modal feature fusion in patent text analysis

Low efficiency in long text modeling

Lack of hierarchical semantic coherence in mining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Comparative Learning for semantic consistency

Multi-modal Graph Attention Network for feature fusion

Multi-Granularity Sparse Attention for efficient modeling

🔎 Similar Papers

LEGO: Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion