Evaluating the Capability of Video Question Generation for Expert Knowledge Elicitation

📅 2025-12-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video question generation (VQG) evaluation overemphasizes “answerability” while neglecting questions’ capacity to elicit implicit expert knowledge. Method: We propose the first knowledge-aware, expert-centered VQG evaluation paradigm, featuring: (1) EgoExoAsk—a large-scale, real-world expert-annotated dataset of 27,666 QA pairs derived from Ego-Exo4D videos, integrating first- and third-person visual perspectives with expert commentary; (2) a cross-modal question–answer retrieval framework trained to learn expert-aligned semantic associations; and (3) a novel, retrieval-based VQG quality metric that is both quantifiable and generalizable. Results: Our metric exhibits strong positive correlation with models’ contextual knowledge utilization capability and effectively discriminates among VQG models in terms of implicit knowledge elicitation performance. The EgoExoAsk dataset is publicly released to advance interpretable, human-centered video-language modeling.

Technology Category

Application Category

📝 Abstract
Skilled human interviewers can extract valuable information from experts. This raises a fundamental question: what makes some questions more effective than others? To address this, a quantitative evaluation of question-generation models is essential. Video question generation (VQG) is a topic for video question answering (VideoQA), where questions are generated for given answers. Their evaluation typically focuses on the ability to answer questions, rather than the quality of generated questions. In contrast, we focus on the question quality in eliciting unseen knowledge from human experts. For a continuous improvement of VQG models, we propose a protocol that evaluates the ability by simulating question-answering communication with experts using a question-to-answer retrieval. We obtain the retriever by constructing a novel dataset, EgoExoAsk, which comprises 27,666 QA pairs generated from Ego-Exo4D's expert commentary annotation. The EgoExoAsk training set is used to obtain the retriever, and the benchmark is constructed on the validation set with Ego-Exo4D video segments. Experimental results demonstrate our metric reasonably aligns with question generation settings: models accessing richer context are evaluated better, supporting that our protocol works as intended. The EgoExoAsk dataset is available in https://github.com/omron-sinicx/VQG4ExpertKnowledge .
Problem

Research questions and friction points this paper is trying to address.

Evaluates video question generation for expert knowledge elicitation
Proposes a protocol to assess question quality via simulated expert communication
Introduces a dataset to benchmark models on generating effective questions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposed protocol simulates expert communication via question-to-answer retrieval
Constructed EgoExoAsk dataset with 27,666 QA pairs from expert annotations
Metric evaluates question quality for eliciting unseen expert knowledge
🔎 Similar Papers
No similar papers found.
H
Huaying Zhang
OMRON SINIC X Corp., Japan, Hokkaido University, Japan
A
Atsushi Hashimoto
OMRON SINIC X Corp., Japan
Tosho Hirasawa
Tosho Hirasawa
Omron Sinic X
Natural Language ProcessingMultimodal LearningMachine Translation