Single- vs. Dual-Prompt Dialogue Generation with LLMs for Job Interviews in Human Resources

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world HR interview dialogue data is scarce, hindering training and evaluation of dialogue systems. Method: This paper systematically compares single-prompt versus dual-prompt (i.e., two-agent collaborative) paradigms for synthetic interview dialogue generation. It introduces a pairwise comparative evaluation framework powered by a judge-style large language model (LLM), enabling the first quantitative, reproducible comparison of generation paradigms in HR interviewing. Results: The dual-prompt approach significantly improves dialogue realism—its outputs are judged as “human-written” up to 10× more often than those from single-prompt baselines. This advantage is robust across diverse generator-judge model pairs (e.g., GPT-4o and Llama 3.3 70B) with only ~6× additional token cost. The core contribution is establishing a novel multi-agent generation + judge-based evaluation paradigm, offering a generalizable methodology for low-resource HR dialogue synthesis.

Technology Category

Application Category

📝 Abstract
Optimizing language models for use in conversational agents requires large quantities of example dialogues. Increasingly, these dialogues are synthetically generated by using powerful large language models (LLMs), especially in domains with challenges to obtain authentic human data. One such domain is human resources (HR). In this context, we compare two LLM-based dialogue generation methods for the use case of generating HR job interviews, and assess whether one method generates higher-quality dialogues that are more challenging to distinguish from genuine human discourse. The first method uses a single prompt to generate the complete interview dialog. The second method uses two agents that converse with each other. To evaluate dialogue quality under each method, we ask a judge LLM to determine whether AI was used for interview generation, using pairwise interview comparisons. We demonstrate that despite a sixfold increase in token cost, interviews generated with the dual-prompt method achieve a win rate up to ten times higher than those generated with the single-prompt method. This difference remains consistent regardless of whether GPT-4o or Llama 3.3 70B is used for either interview generation or judging quality.
Problem

Research questions and friction points this paper is trying to address.

Compare single vs. dual-prompt LLM methods
Generate HR job interview dialogues
Assess dialogue quality and human likeness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-prompt method enhances dialogue quality
LLMs generate synthetic HR interview dialogues
Judge LLM evaluates AI-generated interview authenticity
🔎 Similar Papers
No similar papers found.