TimeScope: Towards Task-Oriented Temporal Grounding In Long Videos

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This paper introduces Task-Oriented Temporal Grounding (ToTG), a novel problem that requires precisely localizing key temporal intervals in long videos based on natural language task descriptions. To address ToTG, we construct the first dedicated benchmark—ToTG Bench—and a high-quality dataset—ToTG Pile—and propose a progressive reasoning framework: first performing coarse-grained candidate segment retrieval, then refining start and end timestamps via fine-grained localization. This design significantly improves temporal localization accuracy and generalization for long-video understanding. Extensive experiments demonstrate that our method consistently outperforms existing temporal grounding approaches and state-of-the-art multimodal large language models across diverse evaluation settings. Results validate its effectiveness, robustness, and practical utility, establishing a new paradigm and providing a practical toolkit for long-video comprehension.

Technology Category

Application Category

📝 Abstract

Identifying key moments in long videos is essential for downstream understanding and reasoning tasks. In this paper, we introduce a new problem, Taskoriented Temporal Grounding ToTG, which aims to localize time intervals containing the necessary information based on a task's natural description. Along with the definition, we also present ToTG Bench, a comprehensive benchmark for evaluating the performance on ToTG. ToTG is particularly challenging for traditional approaches due to their limited generalizability and difficulty in handling long videos. To address these challenges, we propose TimeScope, a novel framework built upon progressive reasoning. TimeScope first identifies a coarse-grained temporal scope in the long video that likely contains the key moments, and then refines this scope through finegrained moment partitioning. Additionally, we curate a highquality dataset, namely ToTG Pile, to enhance TimeScope's ability to perform progressive temporal grounding effectively. Extensive experiments demonstrate that TimeScope consistently outperforms both existing temporalgrounding methods and popular MLLMs across various settings, highlighting its effectiveness in addressing this new challenging problem.

Problem

Research questions and friction points this paper is trying to address.

Localizing task-relevant time intervals in long videos

Addressing limited generalizability of traditional temporal grounding methods

Handling progressive reasoning challenges in lengthy video content

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive reasoning framework for temporal grounding

Coarse-to-fine temporal scope refinement approach

High-quality dataset curation for enhanced performance

🔎 Similar Papers

No similar papers found.