Released SeerAttention-R framework aimed to improve the long decoding efficiency of reasoning models; bitnet.cpp accepted to ACL 2025; LUT Tensor Core accepted to ISCA 2025; SeerAttention achieved 90% sparsity ratio with minimal perplexity loss, offering a 7.3× speedup over FlashAttention-2; T-MAC accepted to EuroSys 2025; BitDistiller accepted to the ACL 2024 main conference; Released BitBLAS and T-MAC libraries to support mixed-precision matrix multiplications; Paper 'Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation' accepted to OSDI 2024; Paper 'Pre-gated MoE' accepted to ISCA 2024.
Research Experience
Currently a senior researcher at the System Group in Microsoft Research Asia. Served as a long-term intern at MSRA's system area mentored by Dr. Ningyi Xu and Dr. Lintao Zhang from 2015 to 2021.
Education
Received B.E. in Computer Science from Harbin Institute of Technology (HIT) in 2016; Ph.D. in Computer Science from HIT in 2021 through a joint-PhD program with MSRA. Supervised by Dr. Ningyi Xu and Dr. Lintao Zhang during his Ph.D.
Background
Research interests lie at the intersection of computer system/architecture and deep learning, including domain-specific architectures, software-hardware co-design, deep learning compression and acceleration. Recently, his research has been focused on model-chip codesign for LLMs, with specific emphasis on low-bit quantization and sparsity techniques.
Miscellany
Actively seeking talents for both full-time positions and research internships throughout the year.