ConFit v2: Improving Resume-Job Matching using Hypothetical Resume Embedding and Runner-Up Hard-Negative Mining

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In resume–job matching, extreme sparsity of interaction labels—arising from candidates applying to only a few positions—severely limits recommendation performance. To address this, we propose a contrastive learning framework that jointly integrates generative data augmentation and high-quality hard negative mining. Specifically, we leverage large language models (LLMs) to synthesize counterfactual positive-resume samples, alleviating the scarcity of ground-truth annotations; concurrently, we introduce a Runner-Up strategy to automatically identify semantically proximal hard negatives. Built upon a dual-tower encoder architecture, our method achieves state-of-the-art results on two real-world datasets, outperforming the prior best method ConFit by +13.8% in Recall@10 and +17.5% in nDCG@10—substantially surpassing both BM25 and OpenAI’s text-embedding-003.

Technology Category

Application Category

📝 Abstract
A reliable resume-job matching system helps a company recommend suitable candidates from a pool of resumes and helps a job seeker find relevant jobs from a list of job posts. However, since job seekers apply only to a few jobs, interaction labels in resume-job datasets are sparse. We introduce ConFit v2, an improvement over ConFit to tackle this sparsity problem. We propose two techniques to enhance the encoder's contrastive training process: augmenting job data with hypothetical reference resume generated by a large language model; and creating high-quality hard negatives from unlabeled resume/job pairs using a novel hard-negative mining strategy. We evaluate ConFit v2 on two real-world datasets and demonstrate that it outperforms ConFit and prior methods (including BM25 and OpenAI text-embedding-003), achieving an average absolute improvement of 13.8% in recall and 17.5% in nDCG across job-ranking and resume-ranking tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhances resume-job matching accuracy
Addresses sparse interaction labels
Improves contrastive training techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hypothetical resume embedding
Hard-negative mining strategy
Contrastive training enhancement
🔎 Similar Papers
No similar papers found.
X
Xiao Yu
Columbia University
Ruize Xu
Ruize Xu
Columbia University
Multi-modal learning
C
Chengyuan Xue
University of Toronto
J
Jinzhong Zhang
Intellipro Group Inc.
Z
Zhou Yu
Columbia University