π€ AI Summary
This work addresses the challenges in British Sign Language (BSL) fingerspelling recognition, including rapid signing speed, frequent omission of letters by native signers, and the limited scale and inaccurate annotations of existing datasets. To overcome these issues, the authors construct FS23K, a large-scale, high-quality dataset, and introduce a novel iterative annotation framework to enhance labeling accuracy. They further propose a multimodal recognition model that explicitly integrates handβhand interaction and mouth movement cues, capturing the coordination between both hands and articulatory gestures. The proposed approach reduces the character error rate (CER) by 50% compared to the current state-of-the-art methods, significantly advancing the fields of sign language understanding and automated annotation.
π Abstract
Fingerspelling is a critical component of British Sign Language (BSL), used to spell proper names, technical terms, and words that lack established lexical signs. Fingerspelling recognition is challenging due to the rapid pace of signing and common letter omissions by native signers, while existing BSL fingerspelling datasets are either small in scale or temporally and letter-wise inaccurate. In this work, we introduce a new large-scale BSL fingerspelling dataset, FS23K, constructed using an iterative annotation framework. In addition, we propose a fingerspelling recognition model that explicitly accounts for bi-manual interactions and mouthing cues. As a result, with refined annotations, our approach halves the character error rate (CER) compared to the prior state of the art on fingerspelling recognition. These findings demonstrate the effectiveness of our method and highlight its potential to support future research in sign language understanding and scalable, automated annotation pipelines. The project page can be found at https://taeinkwon.com/projects/fs23k/.