Understanding Complexity in VideoQA via Visual Program Generation

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study addresses the unreliability and poor scalability of human-annotated difficulty labels in Video Question Answering (VideoQA). To this end, we propose a data-driven visual program generation framework that employs code complexity as a proxy metric for model-perceived question difficulty. Methodologically, we design a fine-grained complexity estimation algorithm integrating visual program synthesis, code generation models, and automatic question generation, optimized end-to-end via correlation analysis and difficulty calibration. Our key contributions are threefold: (1) the first introduction of generated code complexity into VideoQA difficulty modeling; (2) our complexity-based metric exhibits significantly stronger correlation with model performance than human annotations; and (3) a new benchmark constructed using our framework is 1.9× more challenging than NExT-QA, establishing a reproducible, scalable, and automated paradigm for evaluating high-difficulty video QA.

Technology Category

Application Category

📝 Abstract

We propose a data-driven approach to analyzing query complexity in Video Question Answering (VideoQA). Previous efforts in benchmark design have relied on human expertise to design challenging questions, yet we experimentally show that humans struggle to predict which questions are difficult for machine learning models. Our automatic approach leverages recent advances in code generation for visual question answering, using the complexity of generated code as a proxy for question difficulty. We demonstrate that this measure correlates significantly better with model performance than human estimates. To operationalize this insight, we propose an algorithm for estimating question complexity from code. It identifies fine-grained primitives that correlate with the hardest questions for any given set of models, making it easy to scale to new approaches in the future. Finally, to further illustrate the utility of our method, we extend it to automatically generate complex questions, constructing a new benchmark that is 1.9 times harder than the popular NExT-QA.

Problem

Research questions and friction points this paper is trying to address.

Analyzing query complexity in VideoQA using data-driven methods

Automating question difficulty prediction via code generation complexity

Generating harder benchmarks for VideoQA model evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages code generation for VideoQA complexity

Algorithm estimates question difficulty from code

Automatically generates harder benchmark questions

🔎 Similar Papers

VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding