Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
Paper 'Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity' accepted to ACL Findings 2025; 'Synthetic Text Generation for Training Large Language Models via Gradient Matching' accepted to ICML 2025; 'Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures' accepted to ICLR 2025; 'Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization' accepted to NeurIPS 2024.
Research Experience
Before joining UCLA, worked as an AI Resident at VinAI; currently a CS Ph.D. candidate involved in multiple research projects.
Education
Ph.D. candidate in Computer Science at UCLA, supervised by Professor Baharan Mirzasoleiman; previously an AI Resident at VinAI (now Qualcomm AI); received BS degree, summa cum laude, from Toyo University; graduated from High School for Gifted Students (Hanoi University of Science) and was a Silver Medalist at IMO 2015.
Background
Research interests include improving data quality to enhance the performance and efficiency of large (vision-)language models. Specifically, works on synthetic data generation and data selection to optimize training, making these models more effective and accessible. Recently also interested in advancing reasoning via test-time scaling and RL training.
Miscellany
Best way to reach me is by sending an email to nguyentuanhaidang (at) gmail (dot) com.