🤖 AI Summary
This study addresses the longstanding reliance on manual annotation for extracting subtitles and sung lyrics from low-quality televised Taiwanese opera (Gezai opera) recordings, a bottleneck that has hindered related research. The authors propose a two-stage approach: first, automatic localization of lyric segments is achieved by integrating OCR-driven subtitle segmentation with speech-and-music activity detection (SMAD); second, an interactive real-time OCR correction system refines extraction accuracy and ensures precise temporal alignment. This work presents the first integration of OCR and SMAD for traditional Chinese opera audiovisual analysis and constructs the first Gezai opera lyric dataset with accurate time stamps, thereby enabling downstream music information retrieval tasks such as lyric recognition and melody-based search.
📝 Abstract
Taiwanese opera (Kua-\'a-h\`i), a major form of local theatrical tradition, underwent extensive television adaptation notably by pioneers like I\^unn L\=e-hua. These videos, while potentially valuable for in-depth studies of Taiwanese opera, often have low quality and require substantial manual effort during data preparation. To streamline this process, we developed an interactive system for real-time OCR correction and a two-step approach integrating OCR-driven segmentation with Speech and Music Activity Detection (SMAD) to efficiently identify vocal segments from archival episodes with high precision. The resulting dataset, consisting of vocal segments and corresponding lyrics, can potentially supports various MIR tasks such as lyrics identification and tune retrieval. Code is available at https://github.com/z-huang/ocr-subtitle-editor .