MHSNet:An MoE-based Hierarchical Semantic Representation Network for Accurate Duplicate Resume Detection with Large Language Model

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of duplicate resume detection arising from structural heterogeneity, semantic complexity, and information incompleteness in third-party resume sources, this paper proposes a state-aware hierarchical Mixture-of-Experts (MoE) semantic representation framework. The method integrates sparse and dense representations by fine-tuning the BGE-M3 foundation model with contrastive learning and incorporates an expert routing mechanism to enable multi-granularity semantic modeling, thereby significantly improving robustness to incomplete resumes. It further supports dynamic state awareness to enhance semantic matching accuracy between cross-source resumes and enterprise talent databases. Experimental evaluation on a real-world, complex resume dataset demonstrates that the proposed approach achieves an 8.2% improvement in F1-score over mainstream baselines, with substantially higher duplicate detection accuracy and generalization capability compared to existing methods.

Technology Category

Application Category

📝 Abstract
To maintain the company's talent pool, recruiters need to continuously search for resumes from third-party websites (e.g., LinkedIn, Indeed). However, fetched resumes are often incomplete and inaccurate. To improve the quality of third-party resumes and enrich the company's talent pool, it is essential to conduct duplication detection between the fetched resumes and those already in the company's talent pool. Such duplication detection is challenging due to the semantic complexity, structural heterogeneity, and information incompleteness of resume texts. To this end, we propose MHSNet, an multi-level identity verification framework that fine-tunes BGE-M3 using contrastive learning. With the fine-tuned , Mixture-of-Experts (MoE) generates multi-level sparse and dense representations for resumes, enabling the computation of corresponding multi-level semantic similarities. Moreover, the state-aware Mixture-of-Experts (MoE) is employed in MHSNet to handle diverse incomplete resumes. Experimental results verify the effectiveness of MHSNet
Problem

Research questions and friction points this paper is trying to address.

Detecting duplicate resumes with semantic complexity and heterogeneity
Addressing information incompleteness in third-party fetched resumes
Improving talent pool quality through accurate resume deduplication
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tunes BGE-M3 using contrastive learning
Employs Mixture-of-Experts for multi-level representations
Generates sparse and dense semantic similarity features
Y
Yu Li
Hangzhou Dianzi University
Zulong Chen
Zulong Chen
Director, Alibaba Group
Machine LearningLarge Language ModelSearch&RecommendationNLP
W
Wenjian Xu
Zhejiang University of Science and Technology
H
Hong Wen
Alibaba Group
Y
Yipeng Yu
Taotian, Alibaba Group
Man Lung Yiu
Man Lung Yiu
Professor, Hong Kong Polytechnic University
Database
Yuyu Yin
Yuyu Yin
Hangzhou Dianzi University