Currently a Staff Research Engineer at the Huawei Vancouver Research Center, focusing on optimizing large language model (LLM) inference and deployment on Huawei’s CloudMatrix SuperPod. Recent work centers on scaling LLM serving for SuperPod-scale infrastructure, addressing challenges when running mixture-of-experts (MoE) models such as DeepSeek, Kimi, and Qwen on hundreds of interconnected NPUs. Previously, contributed to the design and development of OptVerse, Huawei’s in-house large-scale optimization solver, with a focus on algorithmic innovations and system-level integration for real-world optimization tasks in the cloud.