MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs (ASPLOS 2025)
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism (ASPLOS 2025)
Fairness in Serving Large Language Models (OSDI 2024)
S-LoRA: Serving Thousands of Concurrent LoRA Adapters (MLSys 2024)
Accelerating Data Serialization/Deserialization Protocols with In-Network Compute (ExaMPI@SC 2022)
AdaM: An Adaptive Fine-Grained Scheme for Distributed Metadata Management (ICPP 2019)
Background
Mainly interested in accelerating/optimizing computations (especially ML workloads) on large-scale heterogeneous systems. Currently looking for research interns interested in efficient RL training systems or agent training.
Miscellany
Personal interests include efficient RL training systems and agent training.