See More, Store Less: Memory-Efficient Resolution for Video Moment Retrieval

📅 2026-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges in video moment retrieval posed by memory bottlenecks from dense frame processing and critical information loss due to sparse sampling. To overcome these limitations, the authors propose SMORE, a novel framework that integrates query-guided semantic encoding with an adaptive frame compression mechanism. Leveraging a multimodal large language model, SMORE enables query-aware importance modulation to efficiently preserve salient high-resolution video content under constrained memory budgets. By moving beyond conventional sparse sampling strategies, the method achieves state-of-the-art performance across three major benchmarks: QVHighlights, Charades-STA, and ActivityNet-Captions.

Technology Category

Application Category

📝 Abstract
Recent advances in Multimodal Large Language Models (MLLMs) have improved image recognition and reasoning, but video-related tasks remain challenging due to memory constraints from dense frame processing. Existing Video Moment Retrieval (VMR) methodologies rely on sparse frame sampling, risking potential information loss, especially in lengthy videos. We propose SMORE (See MORE, store less), a framework that enhances memory efficiency while maintaining high information resolution. SMORE (1) uses query-guided captions to encode semantics aligned with user intent, (2) applies query-aware importance modulation to highlight relevant segments, and (3) adaptively compresses frames to preserve key content while reducing redundancy. This enables efficient video understanding without exceeding memory budgets. Experimental validation reveals that SMORE achieves state-of-the-art performance on QVHighlights, Charades-STA, and ActivityNet-Captions benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Video Moment Retrieval
Memory Efficiency
Frame Sampling
Multimodal Large Language Models
Information Loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

Video Moment Retrieval
Memory Efficiency
Query-Guided Captioning
Adaptive Frame Compression
Multimodal Large Language Models
🔎 Similar Papers
No similar papers found.
M
Mingyu Jeon
Department of Artificial Intelligence, Chung-Ang University
S
Sungjin Han
Department of Artificial Intelligence, Chung-Ang University
J
Jinkwon Hwang
Department of Artificial Intelligence, Chung-Ang University
M
Minchol Kwon
Department of Artificial Intelligence, Chung-Ang University
Jonghee Kim
Jonghee Kim
Electronics and Telecommunications Research Institute
Junyeong Kim
Junyeong Kim
Assistant Professor, Department of AI, Chung-Ang University
AICVNLPMultimodal Reasoning