About the job
Microsoft Azure AI Inference platform is the next generation cloud business positioned to address the growing AI market. We offer a fully managed AI Inference platform to accelerate the research, development, and operations of AI powered intelligent solutions at scale. This team owns the hosting, optimization, and scaling the inference stack for all the Azure AI Foundary models including the latest and greatest from OpenAI, Grok, DeepSeek, and other OSS models. You will be joining the AI Core Inferencing team, influencing the overall product, driving new features and platform capabilities from preview to General Availability, and many exciting problems on the intersection of AI and Cloud.
Responsibilities
Design and implement core inference infrastructure for serving frontier AI models in production.
Identify and drive improvements to end-to-end inference performance and efficiency of state-of-the-art LLMs and GenAI models from OpenAI, Anthropic and xAI hosted on AI Foundary.
Design and implement efficient load scheduling and balancing strategies, by leveraging key insights and features of the model and workload.
Scale the platform to support the growing inferencing demand and maintain high availability.
Deliver critical capabilities required to serve the latest and greatest Gen AI models such as GPT5, Realtime audio, Sora, and enable fast time to market for them.
Drive generic features to cater to the needs of customers such as GitHub, M365, Microsoft AI and third-party companies.
Collaborate with our partners both internal and external.
Embody Microsoft's Culture and Values.
Qualifications
Minimum
Bachelor’s degree in Computer Science or a related technical field AND 2+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, or Golang, OR equivalent experience.
Preferred
Technical background with a solid foundation in software engineering principles, distributed computing, and system architecture.Experience working on high-scale, reliable online systems.Experience with real-time online services requiring low latency and high throughput.Experience working with Layer 7 (L7) network proxies and gateways.Knowledge of network architecture and concepts, including HTTP and TCP protocols, authentication, and session management.Knowledge and experience with OSS, Docker, Kubernetes, C++, Golang, or equivalent programming languages.Cross-team collaboration skills and the desire to collaborate in a team of researchers and developers.Ability to independently lead projects.