Who Can We Trust? Scope-Aware Video Moment Retrieval with Multi-Agent Conflict

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

254K/year

🤖 AI Summary

In video moment retrieval, existing methods neglect conflicts among multi-model localization outputs and struggle to reliably identify out-of-range queries without additional training. To address these issues, we propose a reinforcement learning–based multi-agent conflict-aware framework that performs moment localization and generates localization evidence in a single global video scan. We design a multi-agent coordination mechanism to explicitly model localization disagreements and introduce evidence learning to formalize and resolve conflicts. Crucially, our approach achieves reliable detection of out-of-range queries without requiring any extra training—a first in this domain. Extensive experiments on multiple benchmarks demonstrate significant improvements over state-of-the-art methods, validating that explicit conflict modeling and evidence-driven agent collaboration jointly enhance both retrieval accuracy and system trustworthiness.

Technology Category

Application Category

📝 Abstract

Video moment retrieval uses a text query to locate a moment from a given untrimmed video reference. Locating corresponding video moments with text queries helps people interact with videos efficiently. Current solutions for this task have not considered conflict within location results from different models, so various models cannot integrate correctly to produce better results. This study introduces a reinforcement learning-based video moment retrieval model that can scan the whole video once to find the moment's boundary while producing its locational evidence. Moreover, we proposed a multi-agent system framework that can use evidential learning to resolve conflicts between agents' localization output. As a side product of observing and dealing with conflicts between agents, we can decide whether a query has no corresponding moment in a video (out-of-scope) without additional training, which is suitable for real-world applications. Extensive experiments on benchmark datasets show the effectiveness of our proposed methods compared with state-of-the-art approaches. Furthermore, the results of our study reveal that modeling competition and conflict of the multi-agent system is an effective way to improve RL performance in moment retrieval and show the new role of evidential learning in the multi-agent framework.

Problem

Research questions and friction points this paper is trying to address.

Resolving multi-agent localization conflicts in video retrieval

Detecting out-of-scope queries without additional training

Improving reinforcement learning performance through agent competition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning model for video moment boundary detection

Multi-agent system with evidential learning resolves conflicts

Out-of-scope query detection without additional training

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs