Scholar

Thien Huu Nguyen

Google Scholar ID: Da2FhegAAAAJ

University of Oregon

Information ExtractionDeep LearningNatural Language ProcessingMachine Learning

Citations & Impact

All-time

Citations

8,605

H-index

i10-index

104

Publications

Co-authors

list available

Contact

Publications

12 items

2026

Cited

2026

Cited

2026

Cited

2026

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

Resume (English only)

Academic Achievements

In 2024, released Vistral, a Vietnamese large language model based on Mistral 7B, which significantly outperforms ChatGPT on reliable Vietnamese LLM benchmarks
Led the development of CulturaX, a multilingual dataset with 6.3 trillion tokens in 167 languages, adopted by Stability AI to train Stable LM 2 1.6B
Created the Okapi framework for evaluating multilingual LLMs across 26 languages, integrated into EleutherAI's Language Model Evaluation Harness
Published a survey paper on recent advances in NLP via large pre-trained models, accepted by ACM Computing Surveys (Impact Factor: 14.324) in 2023
Conducted a comprehensive evaluation of ChatGPT across 7 tasks and 37 languages in 2023
Awarded the NSF CAREER Award in 2023 to support research on multilingual learning and information extraction

Co-authors

15 total