🤖 AI Summary
To address low retrieval efficiency and slow response to time-sensitive queries in large-scale lifelog video data, this paper proposes a multimodal feature fusion framework with interactive retrieval capabilities. Methodologically: (1) YOLO9000 is employed for fine-grained deep concept detection, augmented by OCR for enhanced textual content understanding; (2) uniform temporal sampling replaces shot boundary detection to improve robustness in temporal coverage; (3) an integrated interactive interface supports feature map visualization, semantic concept search, multi-dimensional filtering, and hand-drawn sketch-based querying. Evaluated on the Lifelog Search Challenge 2020 benchmark, the system achieves significant improvements in both retrieval accuracy and query response latency for complex, time-sensitive queries. Results demonstrate its effectiveness, interpretability, and usability in real-world lifelog analysis scenarios, establishing a novel paradigm for time-critical lifelog retrieval.
📝 Abstract
Since its first iteration in 2018, the Lifelog Search Challenge (LSC) - an interactive competition for retrieving lifelogging moments - is co-located at the annual ACM International Conference on Multimedia Retrieval (ICMR) and has drawn international attention. With the goal of making an ever growing public lifelogging dataset searchable, several teams develop systems for quickly solving time-limited queries during the challenge. Having participated in both previous LSC iterations, i.e. LSC2018 and LSC2019, we present our lifeXplore system - a video exploration and retrieval tool combining feature map browsing, concept search and filtering as well as hand-drawn sketching. The system is improved by including additional deep concept YOLO9000, optical character recognition (OCR) as well as adding uniform sampling as an alternative to the system's traditional underlying shot segmentation.