FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts Reasoning

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of weak domain understanding and shallow reasoning in large language models (LLMs) for sports video question answering (VideoQA), this paper proposes a training-free dual-mode reasoning framework that synergistically integrates reactive and deliberative reasoning. We construct SSGraph—the first multimodal sports knowledge graph covering nine sports—to enhance domain-specific semantics. Inspired by cognitive science, we introduce a novel “thinking agent” architecture that jointly performs visual instance recognition and domain terminology alignment for knowledge grounding. Additionally, we propose a zero-shot multimodal scene graph modeling method to capture spatiotemporal relations in sports videos. Based on this framework, we release two new benchmarks: Gym-QA and Diving-QA. Our approach achieves state-of-the-art performance on Gym-QA, Diving-QA, and SPORTU, while preserving strong generalization across standard VideoQA tasks.

Technology Category

Application Category

📝 Abstract
Video Question Answering (VideoQA) based on Large Language Models (LLMs) has shown potential in general video understanding but faces significant challenges when applied to the inherently complex domain of sports videos. In this work, we propose FineQuest, the first training-free framework that leverages dual-mode reasoning inspired by cognitive science: i) Reactive Reasoning for straightforward sports queries and ii) Deliberative Reasoning for more complex ones. To bridge the knowledge gap between general-purpose models and domain-specific sports understanding, FineQuest incorporates SSGraph, a multimodal sports knowledge scene graph spanning nine sports, which encodes both visual instances and domain-specific terminology to enhance reasoning accuracy. Furthermore, we introduce two new sports VideoQA benchmarks, Gym-QA and Diving-QA, derived from the FineGym and FineDiving datasets, enabling diverse and comprehensive evaluation. FineQuest achieves state-of-the-art performance on these benchmarks as well as the existing SPORTU dataset, while maintains strong general VideoQA capabilities.
Problem

Research questions and friction points this paper is trying to address.

Adaptive reasoning for sports video question answering
Bridging knowledge gap in domain-specific sports understanding
Creating benchmarks for comprehensive sports VideoQA evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free dual-mode reasoning framework
Multimodal sports knowledge graph SSGraph
New sports VideoQA benchmarks introduced
🔎 Similar Papers
No similar papers found.
H
Haodong Chen
School of Automation, Northwestern Polytechnical University
H
Haojian Huang
The University of Hong Kong, Hong Kong, China
X
Xinxiang Yin
School of Software, Northwestern Polytechnical University, Xi’an City, China
Dian Shao
Dian Shao
Associate Professor, Northwest Polytechnical University Xi'an
computer visiondeep learningUAV