Finding Optimal Video Moment without Training: Gaussian Boundary Optimization for Weakly Supervised Video Grounding

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work proposes a Gaussian Boundary Optimization (GBO) framework for accurately localizing video segments relevant to a given query sentence under weak supervision—where only video-sentence pairs are provided without precise temporal annotations. GBO uniquely formulates boundary prediction as an analytically solvable optimization problem, enabling direct derivation of optimal temporal boundaries from Gaussian parameters during inference while balancing proposal coverage and segment compactness. Notably, GBO requires no additional training and seamlessly integrates with both single-Gaussian and mixture-of-Gaussians proposal architectures, offering strong theoretical grounding and broad applicability. Extensive experiments demonstrate that GBO achieves state-of-the-art performance across multiple standard benchmarks, significantly improving temporal localization accuracy and confirming its effectiveness and generalization capability.

Technology Category

Application Category

📝 Abstract

Weakly supervised temporal video grounding aims to localize query-relevant segments in untrimmed videos using only video-sentence pairs, without requiring ground-truth segment annotations that specify exact temporal boundaries. Recent approaches tackle this task by utilizing Gaussian-based temporal proposals to represent query-relevant segments. However, their inference strategies rely on heuristic mappings from Gaussian parameters to segment boundaries, resulting in suboptimal localization performance. To address this issue, we propose Gaussian Boundary Optimization (GBO), a novel inference framework that predicts segment boundaries by solving a principled optimization problem that balances proposal coverage and segment compactness. We derive a closed-form solution for this problem and rigorously analyze the optimality conditions under varying penalty regimes. Beyond its theoretical foundations, GBO offers several practical advantages: it is training-free and compatible with both single-Gaussian and mixture-based proposal architectures. Our experiments show that GBO significantly improves localization, achieving state-of-the-art results across standard benchmarks. Extensive experiments demonstrate the efficiency and generalizability of GBO across various proposal schemes. The code is available at https://github.com/sunoh-kim/gbo.

Problem

Research questions and friction points this paper is trying to address.

weakly supervised video grounding

temporal localization

Gaussian proposals

boundary inference

video-sentence alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian Boundary Optimization

Weakly Supervised Video Grounding

Training-Free Inference