Hirofumi Inaguma
Scholar

Hirofumi Inaguma

Google Scholar ID: 1oanW5sAAAAJ
Fundamental AI Research (FAIR) at Meta
Speech recognitionSpeech translationMultimodal
Citations & Impact
All-time
Citations
3,175
 
H-index
25
 
i10-index
39
 
Publications
20
 
Co-authors
28
list available
Resume (English only)
Academic Achievements
  • Nov 2021: Paper 'Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition' accepted to IEEE/ACM TASLP (first author).
  • Sep 2021: Four papers accepted to ASRU2021, including 'Fast-MD' and 'A COMPARATIVE STUDY ON NON-AUTOREGRESSIVE MODELINGS FOR SPEECH-TO-TEXT GENERATION' (multiple as first author).
  • Sep 2021: Preprint 'Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring' released (first author).
  • Jul 2021: Paper 'ESPnet-ST IWSLT 2021 Offline Speech Translation System' available (first author).
  • Jun 2021: Two papers accepted to INTERSPEECH2021: 'StableEmit' and 'VAD-free Streaming Hybrid CTC/Attention ASR' (both first author).
  • Mar 2021: Paper 'Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation' accepted to NAACL-HLT2021 (first author).
  • Jan 2021: Three papers accepted to ICASSP2021, including 'Orthros', 'Improved Mask-CTC', and 'Recent Developments on ESPnet Toolkit Boosted by Conformer' (some as first author).
  • Dec 2020: Received the 14th IEEE Signal Processing Society (SPS) Japan Student Conference Paper Award.
  • Jul 2020: Four papers accepted to INTERSPEECH2020, including 'CTC-synchronous Training for Monotonic Attention Model' and 'Enhancing Monotonic Multihead Attention for Streaming ASR' (both first author).
  • Key contributions include: Multilingual end-to-end speech translation (ASRU2019), ESPnet-ST (ACL2020), non-autoregressive speech translation (ICASSP2021, 2021), bidirectional sequence-level knowledge distillation (NAACL-HLT2021), minimum latency training for streaming ASR (ICASSP2020), CTC-synchronous training (INTERSPEECH2020), StableEmit (INTERSPEECH2021), and VAD-free decoding (INTERSPEECH2021).