Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
Published multiple papers at top-tier conferences including CVPR, ECCV, BMVC, ACM Multimedia, and WACV
BMVC 2022: Introduced the first weakly-supervised fingerspelling recognition method for British Sign Language and released a new benchmark dataset
ECCV 2022: Proposed scalable methods to densify automatic annotations in sign language videos
CVPR 2022: Developed a sub-word level lip reading model with visual attention, greatly reducing word error rates
BMVC 2021: Proposed a transformer-based architecture for visual keyword spotting
WACV 2021: Introduced a novel audio-visual speech enhancement paradigm robust to visual corruptions
ACM Multimedia 2020 (Oral): Proposed a high-accuracy speech-to-lip generation architecture for in-the-wild scenarios
CVPR 2020: Achieved realistic speech synthesis from silent lip movements for a single speaker
ACM Multimedia 2019 (Oral): Proposed a 'face-to-face translation' pipeline for cross-lingual talking face video translation while preserving pose and background
Research Experience
Conducting doctoral research at VGG, Oxford, focusing on weakly-supervised vision-language tasks
Proposed a novel visual backbone for lip region tracking, significantly reducing word error rates in lip reading
Developed scalable methods to increase automatic annotation density in sign language videos (from 670K to 5M confident annotations)
Designed a novel architecture for accurate audio-driven lip-sync for any identity in the wild
Built an end-to-end system for lip-to-speech synthesis that preserves individual speaking styles