Publications: Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference (ICML 2024); AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration (MLSys 2024, Best Paper Award). Projects: AWQ has received over 2,048 stars on GitHub and is integrated into Transformers, vLLM, FastChat, TensorRT-LLM, and TGI.
Research Experience
Undergraduate researcher at SJTU EPCC Lab, advised by Prof. Jingwen Leng during junior year.
Education
B.Eng. in Computer Science from Shanghai Jiao Tong University (ACM Honors Class), advised by Prof. Jingwen Leng; Ph.D. student at MIT EECS, advised by Prof. Song Han.
Background
Research Interests: Efficient Algorithms and Systems for Large Language Models. Background: Currently a first-year Ph.D. student at MIT EECS.