Software Engineer III, AI/ML Infrastructure, TILES

Google
Sunnyvale, CA, USA

About the job

As a software engineer, you will work on a specific project critical to Google’s needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. In this role, you will join the Top-of-Rack (ToR) Infrastructure team to connect servers driving Google's global services, Cloud, and Artificial Intelligence/Machine Learning (AI/ML) platforms to the network. You will manage Top-of-Rack switch infrastructure and life-cycle engineering, acting as the nexus between compute and the network fabric.

Responsibilities

Develop and maintain software services that orchestrate the network for Google's next-generation hardware and AI/ML platforms.

Contribute to the design and implementation of scalable network models for machines, racks, and ToR-to-fabric topologies using technologies like UNM and Model X.

Build automation to streamline network capacity provisioning and enhance resource efficiency across Google's datacenters.

Enhance the reliability and operational excellence of the ToR infrastructure, ensuring seamless network performance for all Google services.

Partner with teams across cloud to deliver end-to-end networking solutions for NPIs enabling AI/ML infrastructure.

Drive or contribute significantly to projects focused on introducing new rack switch hardware, improving the flexibility, efficiency, speed, and reliability of deployments, and boosting network availability.

Qualifications

Minimum

Bachelor’s degree or equivalent practical experience.

2 years of experience with software development or 1 year of experience with an advanced degree in an industry setting (C++).

Preferred

Master's degree or PhD in Computer Science or related technical fields.

Experience with software development in C++, C, Go, or Java.

Experience with networking or distributed systems.