About the job
We are seeking a Principal Product Manager/Architect to define and guide the technical architecture of Microsoft Foundry as the most reliable, scalable, and efficient AI inferencing platform in the industry. This role sits at the intersection of platform architecture, largescale GPU fleet management, and strategic customer engagement, with end-to-end accountability for the product direction that shape reliability, efficiency, and customer trust at global scale.
Responsibilities
1. Product Reliability
Own the product direction for Microsoft Foundry inference, with a primary mandate to make the platform the most reliable enterprise inferencing service available. This includes defining architectural standards for global serving, multi-region resiliency, automated failover, and platform-managed disaster recovery, evolving the system from customer-managed resilience to platform-managed global reliability. Drive architectural alignment across global routing, capacity pooling, observability, and control plane abstractions to ensure consistent availability, predictable recovery behavior, and simplified customer operations at scale. Partner with engineering, infrastructure, and security leaders to ensure reliability targets, SLAs, SLOs and recovery objectives are designed into the platform by default, not added as afterthoughts.
2. GPU Fleet Efficiency & Capacity
Set the product direction for GPU fleet efficiency and capacity management, guiding platform-level design decisions that maximize utilization, minimize fragmentation, and accelerate timetomonetization of new hardware and models. This includes shaping the architecture for global capacity pooling, intelligent scheduling, fungibility across workloads, automated demand forecasting, and softwaredefined allocation, ensuring Foundry can scale demand while operating within realworld supply constraints. The role will work closely with efficiency and infra teams to translate deep optimization learnings into durable platform primitives, enabling sustained efficiency gains rather than oneoff wins. The Product Manager/Architect is expected to influence architectural investments across inference utilization, model serving, and hardware/system performance, ensuring that efficiency
Qualifications
Minimum
Bachelor's Degree AND 10+ years experience in product/service/program management or software development OR equivalent experience
Other Requirements
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter
Preferred
Proven technical leadership with deep experience designing and operating planet-scale distributed systems, preferably in cloud, AI, or highperformance compute platforms.
Proven track record owning endtoend architecture for missioncritical services with strong availability, resilience, and operational guarantees.
Deep understanding of GPU-backed inference systems, capacity management, scheduling, and performance optimization at scale.
Demonstrated ability to engage credibly with strategic enterprise customers, solving complex architectural problems and influencing platform direction based on real-world needs.
Exceptional communication skills, with the ability to translate complex technical concepts into clear guidance for executives, partners, and customers.