LOVO: Efficient Complex Object Query in Large-Scale Video Datasets

πŸ“… 2025-07-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the challenges of massive data volume, high latency, and poor generalization in complex object queries over large-scale video data, this paper proposes a query-agnostic efficient retrieval framework. Methodologically, it extracts visual embeddings from keyframes once and constructs an inverted multi-index structure; it integrates a pre-trained vision encoder, approximate nearest neighbor search in a vector database, and cross-modal re-ranking to enable flexible responses to unseen categories. The core innovation lies in decoupling index construction from query semantics, drastically reducing offline indexing overhead and online query latency. Evaluated on real-world video datasets, the framework achieves retrieval accuracy close to optimal while reducing end-to-end query latency by up to 85Γ—. Its overall performance significantly surpasses existing state-of-the-art methods.

Technology Category

Application Category

πŸ“ Abstract
The widespread deployment of cameras has led to an exponential increase in video data, creating vast opportunities for applications such as traffic management and crime surveillance. However, querying specific objects from large-scale video datasets presents challenges, including (1) processing massive and continuously growing data volumes, (2) supporting complex query requirements, and (3) ensuring low-latency execution. Existing video analysis methods struggle with either limited adaptability to unseen object classes or suffer from high query latency. In this paper, we present LOVO, a novel system designed to efficiently handle comp$underline{L}$ex $underline{O}$bject queries in large-scale $underline{V}$ide$underline{O}$ datasets. Agnostic to user queries, LOVO performs one-time feature extraction using pre-trained visual encoders, generating compact visual embeddings for key frames to build an efficient index. These visual embeddings, along with associated bounding boxes, are organized in an inverted multi-index structure within a vector database, which supports queries for any objects. During the query phase, LOVO transforms object queries to query embeddings and conducts fast approximate nearest-neighbor searches on the visual embeddings. Finally, a cross-modal rerank is performed to refine the results by fusing visual features with detailed textual features. Evaluation on real-world video datasets demonstrates that LOVO outperforms existing methods in handling complex queries, with near-optimal query accuracy and up to 85x lower search latency, while significantly reducing index construction costs. This system redefines the state-of-the-art object query approaches in video analysis, setting a new benchmark for complex object queries with a novel, scalable, and efficient approach that excels in dynamic environments.
Problem

Research questions and friction points this paper is trying to address.

Efficient querying of complex objects in large-scale video datasets
Handling massive and continuously growing video data volumes
Reducing query latency while maintaining high accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-time feature extraction using visual encoders
Inverted multi-index structure for efficient indexing
Cross-modal rerank fusing visual and textual features
πŸ”Ž Similar Papers
No similar papers found.
Y
Yuxin Liu
Shanghai Jiao Tong University , China
Y
Yuezhang Peng
Shanghai Jiao Tong University , China
Hefeng Zhou
Hefeng Zhou
δΈŠζ΅·δΊ€ι€šε€§ε­¦
AIEA
H
Hongze Liu
Shanghai Jiao Tong University , China
X
Xinyu Lu
Shanghai Jiao Tong University , China
Jiong Lou
Jiong Lou
Research Assistant Professor, Shanghai Jiao Tong University
Edge computingBlockchain
Chentao Wu
Chentao Wu
Professor of Computer Science, Shanghai Jiao Tong University
Data StorageComputer SystemsComputer ArchitectureCloud ComputingAI for Systems
W
Wei Zhao
Shenzhen University of Advanced Technology , China
J
Jie Li
Shanghai Jiao Tong University , China