🤖 AI Summary
Existing video question generation (VQG) evaluation overemphasizes “answerability” while neglecting questions’ capacity to elicit implicit expert knowledge. Method: We propose the first knowledge-aware, expert-centered VQG evaluation paradigm, featuring: (1) EgoExoAsk—a large-scale, real-world expert-annotated dataset of 27,666 QA pairs derived from Ego-Exo4D videos, integrating first- and third-person visual perspectives with expert commentary; (2) a cross-modal question–answer retrieval framework trained to learn expert-aligned semantic associations; and (3) a novel, retrieval-based VQG quality metric that is both quantifiable and generalizable. Results: Our metric exhibits strong positive correlation with models’ contextual knowledge utilization capability and effectively discriminates among VQG models in terms of implicit knowledge elicitation performance. The EgoExoAsk dataset is publicly released to advance interpretable, human-centered video-language modeling.
📝 Abstract
Skilled human interviewers can extract valuable information from experts. This raises a fundamental question: what makes some questions more effective than others? To address this, a quantitative evaluation of question-generation models is essential. Video question generation (VQG) is a topic for video question answering (VideoQA), where questions are generated for given answers. Their evaluation typically focuses on the ability to answer questions, rather than the quality of generated questions. In contrast, we focus on the question quality in eliciting unseen knowledge from human experts. For a continuous improvement of VQG models, we propose a protocol that evaluates the ability by simulating question-answering communication with experts using a question-to-answer retrieval. We obtain the retriever by constructing a novel dataset, EgoExoAsk, which comprises 27,666 QA pairs generated from Ego-Exo4D's expert commentary annotation. The EgoExoAsk training set is used to obtain the retriever, and the benchmark is constructed on the validation set with Ego-Exo4D video segments. Experimental results demonstrate our metric reasonably aligns with question generation settings: models accessing richer context are evaluated better, supporting that our protocol works as intended. The EgoExoAsk dataset is available in https://github.com/omron-sinicx/VQG4ExpertKnowledge .