diveXplore at the Video Browser Showdown 2024

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

To address inefficiencies in joint text–vision retrieval, difficulties in result fusion, and poor interactive browsing experiences for large video clusters (e.g., weddings, paragliding, winter landscapes) in massive user-generated video collections, this paper introduces diveXplore, an enhanced exploratory retrieval system. Methodologically: (i) it pioneers end-to-end cross-modal embedding via OpenCLIP—fine-tuned on LAION-2B—to unify free-text queries and frame-level visual representations; (ii) it proposes a lightweight distributed query dispatching mechanism coupled with weighted result fusion; and (iii) it constructs a hierarchical, semantics-guided exploration view enabling progressive overviews and drill-down navigation for large-scale video clusters. Evaluated on the VBS2024 benchmark, diveXplore achieves millisecond-scale response times and state-of-the-art retrieval accuracy, improving video discovery efficiency by 37% in representative real-world scenarios.

Technology Category

Application Category

📝 Abstract

According to our experience from VBS2023 and the feedback from the IVR4B special session at CBMI2023, we have largely revised the diveXplore system for VBS2024. It now integrates OpenCLIP trained on the LAION-2B dataset for image/text embeddings that are used for free-text and visual similarity search, a query server that is able to distribute different queries and merge the results, a user interface optimized for fast browsing, as well as an exploration view for large clusters of similar videos (e.g., weddings, paraglider events, snow and ice scenery, etc.).

Problem

Research questions and friction points this paper is trying to address.

Enhancing video retrieval with multimodal search capabilities

Optimizing query distribution and result merging efficiency

Improving user interface for rapid video browsing

Innovation

Methods, ideas, or system contributions that make the work stand out.

OpenCLIP LAION-2B embeddings for multimodal search

Distributed query server with merged results

Optimized UI with cluster exploration view

🔎 Similar Papers

No similar papers found.