Thien Huu Nguyen
Scholar

Thien Huu Nguyen

Google Scholar ID: Da2FhegAAAAJ
University of Oregon
Information ExtractionDeep LearningNatural Language ProcessingMachine Learning
Citations & Impact
All-time
Citations
8,605
 
H-index
41
 
i10-index
104
 
Publications
20
 
Co-authors
15
list available
Resume (English only)
Academic Achievements
  • In 2024, released Vistral, a Vietnamese large language model based on Mistral 7B, which significantly outperforms ChatGPT on reliable Vietnamese LLM benchmarks
  • Led the development of CulturaX, a multilingual dataset with 6.3 trillion tokens in 167 languages, adopted by Stability AI to train Stable LM 2 1.6B
  • Created the Okapi framework for evaluating multilingual LLMs across 26 languages, integrated into EleutherAI's Language Model Evaluation Harness
  • Published a survey paper on recent advances in NLP via large pre-trained models, accepted by ACM Computing Surveys (Impact Factor: 14.324) in 2023
  • Conducted a comprehensive evaluation of ChatGPT across 7 tasks and 37 languages in 2023
  • Awarded the NSF CAREER Award in 2023 to support research on multilingual learning and information extraction