1. GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining;
2. NeuralGrok: Accelerate Grokking by Neural Gradient Transformation;
3. Dynamic Gradient Alignment for Online Data Mixing;
4. Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling;
5. HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation;
6. MEDITRON-70B: Scaling Medical Pretraining for Large Language Models;
7. DOGE: Domain Reweighting with Generalization Estimation;
8. Irreducible Curriculum for Language Model Pretraining;
9. ReadingQuizMaker: A Human-NLP Collaborative System that Supports Instructors to Design High-Quality Reading Quiz Questions;
10. Towards Process-Oriented, Modular, and Versatile Question Generation that Meets Educational Needs;
11. Genetic Risk Converges on Regulatory Networks Mediating Early Type-2 Diabetes;
12. Historical OCR Text Quality Analysis and Post-correction.
Research Experience
1. Conducting Ph.D. research at EPFL, focusing on the training of large foundation models;
2. Involved in multiple research projects, including data selection, efficient training algorithms, and model generalization.
Education
1. Ph.D. candidate in Machine Learning at École Polytechnique Fédérale de Lausanne (EPFL), advised by Prof. Martin Jaggi;
2. B.Sc. (honor) in Computer Science at University of Michigan, previously worked with Prof. Rada Mihalcea, Prof. Lu Wang, and Prof. Jie Liu;
3. B.Sc. (government honor) in Electrical and Computer Engineering at Shanghai Jiao Tong University.
Background
Research interests include effective and efficient training of large foundation models, especially in data selection and curriculum design, efficient pretraining and post-training algorithms, understanding LLM training dynamics and generalization behaviors, and foundation models for scientific research (AI4Science).
Miscellany
Hobbies include skiing, photography, piano, singing, ballet & yoga, tennis.