Ennan Zhai
Scholar

Ennan Zhai

Google Scholar ID: TBgb2BsAAAAJ
Alibaba Group
Computer NetworksSecurityProgramming LanguageCloud Computing
Citations & Impact
All-time
Citations
1,490
 
H-index
21
 
i10-index
37
 
Publications
20
 
Co-authors
0
 
Resume (English only)
Academic Achievements
  • Selected publications include:
  • - Aegaeon: Effective GPU Pooling for Concurrent LLM Serving on the Market (SOSP'25)
  • - SyCCL: Exploiting Symmetry for Efficient Collective Communication Scheduling (SIGCOMM'25)
  • - Towards LLM-Based Failure Localization in Production-Scale Networks (SIGCOMM'25)
  • - New Evolution of Hoyan: Enhancing Scalability, Usability, and Accuracy for Alibaba's Global WAN Verification (SIGCOMM'25)
  • - Alibaba Stellar: A New Generation RDMA Network for Cloud AI (SIGCOMM'25)
  • - SkyNet: Analyzing Alert Flooding from Severe Network Failures in Large Cloud Infrastructures (SIGCOMM'25)
Research Experience
  • Prior to joining Alibaba, he was a research scientist and lecturer in the Computer Science Department at Yale University until Jun 2018. During that time, he worked with Ruzica Piskac, Mahesh Balakrishnan, and Avi Silberschatz on building cloud failure auditing systems; and, also worked with Joan Feigenbaum on tracking-resistant anonymous systems. He was also an instructor for Building Distributed Systems course.
Education
  • Received his Ph.D. degree in 2015 from Yale University, under the guidance of Bryan Ford. His dissertation work focused on building the first cloud-reliability auditing system (named Independence-as-a-Service or INDaaS) that proactively detects deep, unexpected dependencies potentially causing cloud-scale correlated failures, which was published in OSDI'14.
Background
  • Currently a Director of Network Research at Alibaba Cloud. His research focuses on building high-performance and reliable network systems for AI and Cloud, with a particular emphasis on network for AI and AI for network.
Co-authors
0 total
Co-authors: 0 (list not available)