Published several papers including 'Plato: Plan to Efficiently Decode for Large Language Model Inference' (COLM2025), 'Compute Or Load KV Cache? Why Not Both?' (ICML2025), and has been involved in projects such as Cake, Plato, HeterMoE, Eagle, etc.
Research Experience
Current research focuses on improving the efficiency of large language model (LLM) inference through the co-design of algorithms and system architectures. Research areas also include machine learning systems and network systems.
Education
PhD in Computer Science and Engineering from the University of Michigan, 2020-2025, supervised by Prof. Z. Morley Mao; BEng in Computer Science from the School of the Gifted Young, University of Science and Technology of China, 2016-2020.
Background
Currently an Applied Scientist at Amazon working on LLM Post-training Algorithm/System. Research interests lie at the intersection of machine learning systems and network systems, with a focus on enhancing the efficiency of large language model (LLM) inference through the co-design of algorithms and system architectures.
Miscellany
Loves the movie 'Everything Everywhere All at Once', believes that art is long-lasting while life is short.