Scholar

Hirofumi Inaguma

Google Scholar ID: 1oanW5sAAAAJ

Fundamental AI Research (FAIR) at Meta

Speech recognitionSpeech translationMultimodal

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

3,175

H-index

i10-index

Publications

Co-authors

list available

Contact

CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

1 items

Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation

2025

Cited

Resume (English only)

Academic Achievements

Nov 2021: Paper 'Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition' accepted to IEEE/ACM TASLP (first author).
Sep 2021: Four papers accepted to ASRU2021, including 'Fast-MD' and 'A COMPARATIVE STUDY ON NON-AUTOREGRESSIVE MODELINGS FOR SPEECH-TO-TEXT GENERATION' (multiple as first author).
Sep 2021: Preprint 'Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring' released (first author).
Jul 2021: Paper 'ESPnet-ST IWSLT 2021 Offline Speech Translation System' available (first author).
Jun 2021: Two papers accepted to INTERSPEECH2021: 'StableEmit' and 'VAD-free Streaming Hybrid CTC/Attention ASR' (both first author).
Mar 2021: Paper 'Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation' accepted to NAACL-HLT2021 (first author).
Jan 2021: Three papers accepted to ICASSP2021, including 'Orthros', 'Improved Mask-CTC', and 'Recent Developments on ESPnet Toolkit Boosted by Conformer' (some as first author).
Dec 2020: Received the 14th IEEE Signal Processing Society (SPS) Japan Student Conference Paper Award.
Jul 2020: Four papers accepted to INTERSPEECH2020, including 'CTC-synchronous Training for Monotonic Attention Model' and 'Enhancing Monotonic Multihead Attention for Streaming ASR' (both first author).
Key contributions include: Multilingual end-to-end speech translation (ASRU2019), ESPnet-ST (ACL2020), non-autoregressive speech translation (ICASSP2021, 2021), bidirectional sequence-level knowledge distillation (NAACL-HLT2021), minimum latency training for streaming ASR (ICASSP2020), CTC-synchronous training (INTERSPEECH2020), StableEmit (INTERSPEECH2021), and VAD-free decoding (INTERSPEECH2021).

Co-authors

28 total