A Lightweight Moment Retrieval System with Global Re-Ranking and Robust Adaptive Bidirectional Temporal Search

📅 2025-04-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of low efficiency, weak temporal modeling, and incomplete contextual understanding in fine-grained moment retrieval from large-scale video corpora, this paper proposes an interactive video corpus moment retrieval framework. The method introduces three key innovations: (1) a hyper-global re-ranking mechanism that transcends local similarity modeling; (2) robust adaptive bidirectional temporal search (ABTS), which jointly optimizes temporal continuity and computational efficiency; and (3) an integrated pipeline combining keyframe extraction, image hashing-based deduplication, cross-modal similarity modeling, and dynamic temporal pruning. Experimental results demonstrate significant reductions in storage and computational overhead while maintaining high localization accuracy across heterogeneous, multi-source video repositories. The framework exhibits strong scalability and cross-domain robustness, establishing a novel paradigm for large-scale video segment retrieval.

Technology Category

Application Category

📝 Abstract
The exponential growth of digital video content has posed critical challenges in moment-level video retrieval, where existing methodologies struggle to efficiently localize specific segments within an expansive video corpus. Current retrieval systems are constrained by computational inefficiencies, temporal context limitations, and the intrinsic complexity of navigating video content. In this paper, we address these limitations through a novel Interactive Video Corpus Moment Retrieval framework that integrates a SuperGlobal Reranking mechanism and Adaptive Bidirectional Temporal Search (ABTS), strategically optimizing query similarity, temporal stability, and computational resources. By preprocessing a large corpus of videos using a keyframe extraction model and deduplication technique through image hashing, our approach provides a scalable solution that significantly reduces storage requirements while maintaining high localization precision across diverse video repositories.
Problem

Research questions and friction points this paper is trying to address.

Efficiently localize specific segments in large video corpus
Overcome computational inefficiencies and temporal context limitations
Reduce storage needs while maintaining high localization precision
Innovation

Methods, ideas, or system contributions that make the work stand out.

SuperGlobal Reranking for query similarity optimization
Adaptive Bidirectional Temporal Search (ABTS)
Keyframe extraction and deduplication via image hashing
🔎 Similar Papers
No similar papers found.
T
Tinh-Anh Nguyen-Nhu
Ho Chi Minh University of Technology, VNU-HCM, Vietnam
H
Huu-Loc Tran
University of Information Technology, VNU-HCM, Vietnam
Nguyen-Khang Le
Nguyen-Khang Le
Japan Advanced Institute of Science and Technology
Deep Learning
M
Minh-Nhat Nguyen
University of Economics Ho Chi Minh City, Ho Chi Minh, Vietnam
T
Tien-Huy Nguyen
University of Information Technology, VNU-HCM, Vietnam
H
Hoang-Long Nguyen-Huu
University of Information Technology, VNU-HCM, Vietnam
H
Huu-Phong Phan-Nguyen
University of Information Technology, VNU-HCM, Vietnam
Huy Pham
Huy Pham
Aarhus University
RoboticsAutonomous NavigationArtificial Intelligence
Q
Quan Nguyen
Posts and Telecommunications Institute of Technology, Hanoi, Vietnam
H
Hoang M. Le
York University, Canada
Q
Q. Dinh
AI VIETNAM Lab