Publications: MHE-TPE micro-architecture accepted to MICRO’25; Tensor Processing Engines paper accepted to HPCA’25; EN-Tensorcore paper accepted to ICCD’24; SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs accepted to NeurIPS 2025; LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration accepted to ISCA 2025; Allo: A Programming Model for Composable Accelerator Design accepted to PLDI 2024; Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs accepted to HPCA 2025; EN-T: Optimizing Tensor Computing Engines Performance via Encoder-Based Methodology accepted to ICCD 2024. Awards: Stars of Tomorrow award from Microsoft Research Asia.
Research Experience
Interned at Microsoft Research Asia, working with Dr. Shijie Cao on efficient systems for long-context LLMs; Worked with Prof. Zhiru Zhang at Cornell on domain-specific compilers for accelerator design.
Education
PhD student at the University of Washington, advised by Prof. Ang Li and Prof. Banghua Zhu; Bachelor of Science in Physics from the University of Science and Technology of China (USTC), recipient of the Guo Moruo Scholarship (highest honor for USTC undergraduates).
Background
Research Interests: Developing efficient system support for large language models (LLMs). Field: Computer Science.
Miscellany
Service: Artifact Evaluation Committee member for MLSys 2025, ASPLOS 2025, HPCA 2025, MICRO 2024; Conference Reviewer for ICLR 2025, ACL 2025, NeurIPS 2024; Teaching Assistant for CSE 469: Computer Architecture, Spring 2025, UW.