🤖 AI Summary
During requirements elicitation interviews, experienced interviewers often struggle to formulate high-quality follow-up questions in real time due to domain knowledge gaps, high cognitive load, and information overload.
Method: We propose an error-guided prompting framework that integrates a taxonomy of common interviewer errors with the GPT-4o large language model, enabling dynamic generation of contextually relevant questions grounded in受访者 utterances. Structured prompting enhances question relevance and informativeness.
Contribution/Results: Empirical evaluation shows that LLM-generated questions match human-authored questions in clarity, relevance, and informativeness—and significantly outperform baseline LLM approaches when guided by error categories. Our approach establishes an interpretable, reusable generative paradigm for intelligent requirements engineering assistance, effectively alleviating interviewer cognitive burden while improving both the quality and efficiency of requirements elicitation.
📝 Abstract
Interviews are a widely used technique in eliciting requirements to gather stakeholder needs, preferences, and expectations for a software system. Effective interviewing requires skilled interviewers to formulate appropriate interview questions in real time while facing multiple challenges, including lack of familiarity with the domain, excessive cognitive load, and information overload that hinders how humans process stakeholders' speech. Recently, large language models (LLMs) have exhibited state-of-the-art performance in multiple natural language processing tasks, including text summarization and entailment. To support interviewers, we investigate the application of GPT-4o to generate follow-up interview questions during requirements elicitation by building on a framework of common interviewer mistake types. In addition, we describe methods to generate questions based on interviewee speech. We report a controlled experiment to evaluate LLM-generated and human-authored questions with minimal guidance, and a second controlled experiment to evaluate the LLM-generated questions when generation is guided by interviewer mistake types. Our findings demonstrate that, for both experiments, the LLM-generated questions are no worse than the human-authored questions with respect to clarity, relevancy, and informativeness. In addition, LLM-generated questions outperform human-authored questions when guided by common mistakes types. This highlights the potential of using LLMs to help interviewers improve the quality and ease of requirements elicitation interviews in real time.