About the job
Research Interns at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment. The Systems Reliability Group at Microsoft Research is looking for motivated Research Interns to tackle cutting-edge challenges at the intersection of distributed systems, AI systems, and software engineering.
Responsibilities
Dive into real-world systems: Work with large-scale codebases, configurations, and deployments powering Microsoft Azure and Office 365.
Analyze production data: Discover how real cloud systems fail—and design strategies to prevent it.
Push the boundaries: Apply cutting-edge Large Language Model (LLM) and Agentic technology to solve reliability challenges in cloud and AI systems.
Innovate in failure diagnosis and prevention: Build novel tools for monitoring, logging, and troubleshooting at scale.
Validate your ideas in the wild: Integrate and evaluate your solutions on real Microsoft services and incidents.
Qualifications
Minimum
Currently enrolled in a PhD program in Computer Science or a related STEM field.
Preferred
Experience of building scalable and reliable systems.
Demonstrated ability to develop original research agenda.
Ability to collaborate effectively with other researchers and product development teams.
Proficient interpersonal skills, cross-group, and cross-culture collaboration.
Ability to think unconventionally to derive creative and innovative solutions.