BioVL-QR: Egocentric Biochemical Vision-and-Language Dataset Using Micro QR Codes

📅 2024-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of recognizing laboratory equipment, reagents, and containers in biochemical instructional videos—complicated by cluttered environments and visual similarity among objects. To this end, we propose a Micro QR code–assisted vision-language alignment method. We introduce BioVL-QR, the first egocentric biochemical experiment vision-language dataset, comprising 23 instructional videos, corresponding protocols, and fine-grained step-level annotations. We design a novel object labeling framework that jointly leverages a custom QR code detector and pretrained gesture detectors (e.g., YOLO, HRNet), substantially reducing annotation effort. Additionally, we formulate video-text alignment modeling and step localization as core tasks. Our approach achieves significant performance gains on biochemical instructional video understanding benchmarks. All data, annotations, and code are publicly released to advance embodied vision-language understanding in biochemical domains.

Technology Category

Application Category

📝 Abstract
This paper introduces BioVL-QR, a biochemical vision-and-language dataset comprising 23 egocentric experiment videos, corresponding protocols, and vision-and-language alignments. A major challenge in understanding biochemical videos is detecting equipment, reagents, and containers because of the cluttered environment and indistinguishable objects. Previous studies assumed manual object annotation, which is costly and time-consuming. To address the issue, we focus on Micro QR Codes. However, detecting objects using only Micro QR Codes is still difficult due to blur and occlusion caused by object manipulation. To overcome this, we propose an object labeling method combining a Micro QR Code detector with an off-the-shelf hand object detector. As an application of the method and BioVL-QR, we tackled the task of localizing the procedural steps in an instructional video. The experimental results show that using Micro QR Codes and our method improves biochemical video understanding. Data and code are available through https://nishi10mo.github.io/BioVL-QR/
Problem

Research questions and friction points this paper is trying to address.

Detects objects in cluttered biochemical videos efficiently.
Overcomes Micro QR Code detection challenges like blur and occlusion.
Improves localization of procedural steps in instructional videos.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Micro QR Code detection
Hand object detector integration
Egocentric biochemical video analysis
🔎 Similar Papers
No similar papers found.
Taichi Nishimura
Taichi Nishimura
PlayStation
MultimediaComputer VisionNatural Language Processing
K
Koki Yamamoto
Kyoto University
Y
Yuto Haneji
Kyoto University
K
Keiya Kajimura
Kyoto University
C
Chihiro Nishiwaki
Osaka Medical and Pharmaceutical University
E
Eriko Daikoku
Osaka Medical and Pharmaceutical University
N
Natsuko Okuda
Osaka Medical and Pharmaceutical University
Fumihito Ono
Fumihito Ono
Osaka Medical and Pharmaceutical University
Hirotaka Kameko
Hirotaka Kameko
Assistant Professor, Kyoto University
Natural Language ProcessingGame AI
S
Shinsuke Mori
Kyoto University