🤖 AI Summary
This study systematically examines the applicability boundaries of AI-powered voice interviewers in quantitative and qualitative data collection. Addressing limitations in speech recognition accuracy, affective robustness, and logical coherence of follow-up questioning, we propose a dual-dimensional evaluation framework—integrating voice interaction performance (ASR, TTS, real-time transcription, multimodal affective analysis) and linguistic reasoning capabilities (dynamic probing, contextual clarification, branching logic handling). Leveraging Transformer-based large language models, IVR systems, and multimodal sensing technologies, we conduct empirical validation. Results indicate that AI voice interviewers significantly outperform conventional IVR systems, particularly in interaction naturalness and task adaptability. However, transcription error rates, affective classification accuracy, and follow-up question quality remain highly context-dependent, constraining autonomous deployment in in-depth qualitative research. The study establishes both theoretical foundations and practical benchmarks for designing and ethically deploying AI-augmented social science research tools.
📝 Abstract
Transformer-based Large Language Models (LLMs) have paved the way for "AI interviewers" that can administer voice-based surveys with respondents in real-time. This position paper reviews emerging evidence to understand when such AI interviewing systems are fit for purpose for collecting data within quantitative and qualitative research contexts. We evaluate the capabilities of AI interviewers as well as current Interactive Voice Response (IVR) systems across two dimensions: input/output performance (i.e., speech recognition, answer recording, emotion handling) and verbal reasoning (i.e., ability to probe, clarify, and handle branching logic). Field studies suggest that AI interviewers already exceed IVR capabilities for both quantitative and qualitative data collection, but real-time transcription error rates, limited emotion detection abilities, and uneven follow-up quality indicate that the utility, use and adoption of current AI interviewer technology may be context-dependent for qualitative data collection efforts.