ConFit v2: Improving Resume-Job Matching using Hypothetical Resume Embedding and Runner-Up Hard-Negative Mining

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

In resume–job matching, extreme sparsity of interaction labels—arising from candidates applying to only a few positions—severely limits recommendation performance. To address this, we propose a contrastive learning framework that jointly integrates generative data augmentation and high-quality hard negative mining. Specifically, we leverage large language models (LLMs) to synthesize counterfactual positive-resume samples, alleviating the scarcity of ground-truth annotations; concurrently, we introduce a Runner-Up strategy to automatically identify semantically proximal hard negatives. Built upon a dual-tower encoder architecture, our method achieves state-of-the-art results on two real-world datasets, outperforming the prior best method ConFit by +13.8% in Recall@10 and +17.5% in nDCG@10—substantially surpassing both BM25 and OpenAI’s text-embedding-003.

Technology Category

Application Category

📝 Abstract

A reliable resume-job matching system helps a company recommend suitable candidates from a pool of resumes and helps a job seeker find relevant jobs from a list of job posts. However, since job seekers apply only to a few jobs, interaction labels in resume-job datasets are sparse. We introduce ConFit v2, an improvement over ConFit to tackle this sparsity problem. We propose two techniques to enhance the encoder's contrastive training process: augmenting job data with hypothetical reference resume generated by a large language model; and creating high-quality hard negatives from unlabeled resume/job pairs using a novel hard-negative mining strategy. We evaluate ConFit v2 on two real-world datasets and demonstrate that it outperforms ConFit and prior methods (including BM25 and OpenAI text-embedding-003), achieving an average absolute improvement of 13.8% in recall and 17.5% in nDCG across job-ranking and resume-ranking tasks.

Problem

Research questions and friction points this paper is trying to address.

Enhances resume-job matching accuracy

Addresses sparse interaction labels

Improves contrastive training techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hypothetical resume embedding

Hard-negative mining strategy

Contrastive training enhancement

🔎 Similar Papers

No similar papers found.