arXiv preprint: 'Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling'
'CoffeeBoost: Gradient Boosting Native Conformal Inference for Bayesian Optimization' accepted to AAAI 2025
'Centrum: Model-based Database Auto-tuning with Minimal Distributional Assumptions' accepted to SIGMOD 2025
'Mirage: MOE + Decision Transformer for non-interruption, non-overlap resource provision in ML training' published at SC 2023
'Shockwave: Fair and Efficient Scheduling for Dynamic Adaptation in Machine Learning' published at NSDI 2023
Mentored students including Lynn Liu (HyperAPX, now Ph.D. at UC Berkeley), Rui Pan (Shockwave, now Ph.D. at Princeton), and Calvin Ma (Hound)
Background
Full-time Staff Researcher at Huawei Technologies
Research areas include autonomous system agents, distributed systems, and disaggregated datacenter architecture
Designs statistical and neural learning methods to model dynamics and uncertainty in large-scale distributed computer systems
Develops algorithmic decision-making mechanisms (e.g., convex/nonlinear optimization, Bayesian optimization, contextual bandits, reinforcement learning) to optimize system performance, efficiency, and scalability
Recent work includes black-box optimization for autonomous system tuning, real-time recommendation algorithms for preconditioner/solver selection in linear systems, dynamic market theory, and stochastic policy learning for scheduling and resource allocation
Works on high-performance ML training and inference, building multi-dimensional hybrid parallelism solvers (data, tensor, pipeline, sequence, expert parallelism, offload, rematerialization, pipeline interleaving) to optimize token throughput, MFU, and scale-out linearity for hyper-scale LLM training on massive GPU/NPU clusters
Researches dynamic MoE layers and real-time model/input pruning techniques to accelerate LLM inference