FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts Reasoning

📅 2025-09-15

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

To address the challenges of weak domain understanding and shallow reasoning in large language models (LLMs) for sports video question answering (VideoQA), this paper proposes a training-free dual-mode reasoning framework that synergistically integrates reactive and deliberative reasoning. We construct SSGraph—the first multimodal sports knowledge graph covering nine sports—to enhance domain-specific semantics. Inspired by cognitive science, we introduce a novel “thinking agent” architecture that jointly performs visual instance recognition and domain terminology alignment for knowledge grounding. Additionally, we propose a zero-shot multimodal scene graph modeling method to capture spatiotemporal relations in sports videos. Based on this framework, we release two new benchmarks: Gym-QA and Diving-QA. Our approach achieves state-of-the-art performance on Gym-QA, Diving-QA, and SPORTU, while preserving strong generalization across standard VideoQA tasks.

Technology Category

Application Category

📝 Abstract

Video Question Answering (VideoQA) based on Large Language Models (LLMs) has shown potential in general video understanding but faces significant challenges when applied to the inherently complex domain of sports videos. In this work, we propose FineQuest, the first training-free framework that leverages dual-mode reasoning inspired by cognitive science: i) Reactive Reasoning for straightforward sports queries and ii) Deliberative Reasoning for more complex ones. To bridge the knowledge gap between general-purpose models and domain-specific sports understanding, FineQuest incorporates SSGraph, a multimodal sports knowledge scene graph spanning nine sports, which encodes both visual instances and domain-specific terminology to enhance reasoning accuracy. Furthermore, we introduce two new sports VideoQA benchmarks, Gym-QA and Diving-QA, derived from the FineGym and FineDiving datasets, enabling diverse and comprehensive evaluation. FineQuest achieves state-of-the-art performance on these benchmarks as well as the existing SPORTU dataset, while maintains strong general VideoQA capabilities.

Problem

Research questions and friction points this paper is trying to address.

Adaptive reasoning for sports video question answering

Bridging knowledge gap in domain-specific sports understanding

Creating benchmarks for comprehensive sports VideoQA evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free dual-mode reasoning framework

Multimodal sports knowledge graph SSGraph

New sports VideoQA benchmarks introduced

🔎 Similar Papers

ExpertAF: Expert Actionable Feedback from Video