- [arXiv Preprint] MorphServe: Efficient and Workload-Aware LLM Serving via Runtime Layer Swapping and KV Cache Resizing.
- [arXiv Preprint] λScale: Enabling Fast Scaling for Serverless Large Language Model Inference.
- [arXiv Preprint] ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates.
- [NSDI ’26] Towards Efficient LLM Storage Reduction via Tensor Deduplication and Delta Compression.
- [VLDB ’24] Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to Ask.
- [DRBSD ’22] Understanding Impact of Lossy Compression on Derivative-related Metrics in Scientific Datasets.
Research Experience
- University of Virginia, Research/Teaching Assistant, Aug 2022 - Present
- Samsung Semiconductor, Research Intern, May 2024 - Aug 2024
- Argonne National Laboratory, Research Intern, May 2022 - Aug 2022
- George Mason University, Research/Teaching Assistant, Aug 2021 - May 2022
Education
- University of Virginia, Ph.D. in Computer Science, GPA 4.0, Aug 2022 - Present
- George Mason University, Ph.D. in Computer Science, GPA 4.0, Aug 2021 - Jul 2022
Background
I am a fifth-year Ph.D. student in the DS2 Lab at the University of Virginia, where I am advised by Prof. Yue Cheng, working on LLM systems and systems for LLMs. My research focuses on building efficient and adaptive LLM systems, leveraging data-driven, reduction, and compression techniques to enhance performance, scalability, and reliability of LLM inference, storage, and deployment at scale.