Published multiple papers on large-scale distributed ML & HPC, such as 'MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production'; won the KDD 2020 Best Paper Award; is a major contributor to several open-source software projects like verl, BytePS/ps-lite/horovod, GluonNLP, and Apache MXNet.
Research Experience
Works at Bytedance Seed, focusing on optimizing the training framework for LLMs and multimodal models; previously worked on collective communication libraries for recommendation systems at Bytedance, and Apache MXNet at Amazon, responsible for training, inference, and BERT pre-training recipes with gluon-nlp.
Education
Information not available
Background
Research Interests: LLM infrastructure optimization, including pre-training and post-training. Specialization: Distributed training, recommendation systems, natural language processing.
Miscellany
Served as a reviewer for AISTATS 2021, VLDB 2023, MLSys 2025 (Area chair), ICLR 2025 (SCI-FM), COLM 2025, NeurIPS 2025; received awards including Soong Ching Ling Scholarships, Dean's Honors List, HKUEAA Scholarships (Top 0.1%).