In 2024, released Vistral, a Vietnamese large language model based on Mistral 7B, which significantly outperforms ChatGPT on reliable Vietnamese LLM benchmarks
Led the development of CulturaX, a multilingual dataset with 6.3 trillion tokens in 167 languages, adopted by Stability AI to train Stable LM 2 1.6B
Created the Okapi framework for evaluating multilingual LLMs across 26 languages, integrated into EleutherAI's Language Model Evaluation Harness
Published a survey paper on recent advances in NLP via large pre-trained models, accepted by ACM Computing Surveys (Impact Factor: 14.324) in 2023
Conducted a comprehensive evaluation of ChatGPT across 7 tasks and 37 languages in 2023
Awarded the NSF CAREER Award in 2023 to support research on multilingual learning and information extraction