Scholar

Shijie Cao

Google Scholar ID: StqnQfsAAAAJ

Microsoft Research Asia

Efficient Deep LearningDeep Learning SystemComputer Architecture

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

897

H-index

i10-index

Publications

Co-authors

list available

Contact

GitHubOpen ↗LinkedInOpen ↗

Publications

20 items

Harmonizing Real-Time Constraints and Long-Horizon Reasoning: An Asynchronous Agentic Framework for Dynamic Scheduling

2026

Cited

DynaSchedBench: Calibrated Dynamic Scheduling Benchmarks and Observability Paradox in LLM-based Scheduling Agents

2026

Cited

TOM: A Ternary Read-only Memory Accelerator for LLM-powered Edge Intelligence

2026

Cited

HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing

2026

Cited

MiMo-V2-Flash Technical Report

arXiv.org · 2026

Cited

T-MAN: Enabling End-to-End Low-Bit LLM Inference on NPUs via Unified Table Lookup

2025

Cited

TENET: An Efficient Sparsity-Aware LUT-Centric Architecture for Ternary LLM Inference On Edge

2025

Cited

Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

2025

Cited

Resume (English only)

Academic Achievements

Released SeerAttention-R framework aimed to improve the long decoding efficiency of reasoning models; bitnet.cpp accepted to ACL 2025; LUT Tensor Core accepted to ISCA 2025; SeerAttention achieved 90% sparsity ratio with minimal perplexity loss, offering a 7.3× speedup over FlashAttention-2; T-MAC accepted to EuroSys 2025; BitDistiller accepted to the ACL 2024 main conference; Released BitBLAS and T-MAC libraries to support mixed-precision matrix multiplications; Paper 'Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation' accepted to OSDI 2024; Paper 'Pre-gated MoE' accepted to ISCA 2024.

Research Experience

Currently a senior researcher at the System Group in Microsoft Research Asia. Served as a long-term intern at MSRA's system area mentored by Dr. Ningyi Xu and Dr. Lintao Zhang from 2015 to 2021.

Education

Received B.E. in Computer Science from Harbin Institute of Technology (HIT) in 2016; Ph.D. in Computer Science from HIT in 2021 through a joint-PhD program with MSRA. Supervised by Dr. Ningyi Xu and Dr. Lintao Zhang during his Ph.D.

Background

Research interests lie at the intersection of computer system/architecture and deep learning, including domain-specific architectures, software-hardware co-design, deep learning compression and acceleration. Recently, his research has been focused on model-chip codesign for LLMs, with specific emphasis on low-bit quantization and sparsity techniques.

Miscellany