Shijie Cao
Scholar

Shijie Cao

Google Scholar ID: StqnQfsAAAAJ
Microsoft Research Asia
Efficient Deep LearningDeep Learning SystemComputer Architecture
Citations & Impact
All-time
Citations
897
 
H-index
14
 
i10-index
14
 
Publications
20
 
Co-authors
16
list available
Resume (English only)
Academic Achievements
  • Released SeerAttention-R framework aimed to improve the long decoding efficiency of reasoning models; bitnet.cpp accepted to ACL 2025; LUT Tensor Core accepted to ISCA 2025; SeerAttention achieved 90% sparsity ratio with minimal perplexity loss, offering a 7.3× speedup over FlashAttention-2; T-MAC accepted to EuroSys 2025; BitDistiller accepted to the ACL 2024 main conference; Released BitBLAS and T-MAC libraries to support mixed-precision matrix multiplications; Paper 'Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation' accepted to OSDI 2024; Paper 'Pre-gated MoE' accepted to ISCA 2024.
Research Experience
  • Currently a senior researcher at the System Group in Microsoft Research Asia. Served as a long-term intern at MSRA's system area mentored by Dr. Ningyi Xu and Dr. Lintao Zhang from 2015 to 2021.
Education
  • Received B.E. in Computer Science from Harbin Institute of Technology (HIT) in 2016; Ph.D. in Computer Science from HIT in 2021 through a joint-PhD program with MSRA. Supervised by Dr. Ningyi Xu and Dr. Lintao Zhang during his Ph.D.
Background
  • Research interests lie at the intersection of computer system/architecture and deep learning, including domain-specific architectures, software-hardware co-design, deep learning compression and acceleration. Recently, his research has been focused on model-chip codesign for LLMs, with specific emphasis on low-bit quantization and sparsity techniques.
Miscellany
  • Actively seeking talents for both full-time positions and research internships throughout the year.