Paper on NatureLM-audio accepted at ICLR 2025—the first large audio-language model for animal sounds.
Project MOSLA paper accepted at LREC-COLING 2024 and nominated for Best Paper.
ISPA paper accepted at XAI-SA Workshop (ICASSP 2024).
Co-authored AVES and BEANS papers accepted at ICASSP 2023.
GitHub Typo Corpus paper accepted at LREC 2020.
TEASPN framework paper accepted at EMNLP 2019 (system demonstration).
Research featured in TechCrunch and Quartz.
Developed GrammarTagger (neural multilingual grammar profiler) and EXPATS (explainable automated text scoring toolkit).
Research Experience
Research Lead at Earth Species Project since November 2021, leading projects on non-human communication decoding.
Former Machine Learning Engineer/Researcher at Duolingo, led the launch of Japanese, Korean, and Chinese courses.
Collaborated with Mirai Translate (a Japan-based MT startup) and ACTNext (ACT’s R&D unit for educational research).
Co-developed an ultra fine-grained NER system with Studio Ousia, ranked #2 at TAC KBP 2019.
Co-authored the book 'Real-World Natural Language Processing' and is writing a book on Japanese NLP with Paul O'Leary McCann.
Background
Currently a Research Lead at Earth Species Project, working on decoding non-human communication using AI/ML technologies.
Pioneering the field of Animal Language Processing (ALP), focusing on foundation models (e.g., NatureLM-audio, AVES) and benchmarks (e.g., BEANS) for non-human animals.
Fluent in Chinese, Japanese, and English; currently learning Korean and Lojban.
Passionate about connecting language and machine learning to help people learn languages.
Diagnosed with stage IV lung cancer in March 2023; currently stable but with a poor prognosis; raising funds for family and medical expenses.