About the job
The TILES (Tor Infrastructure Lifecycle Engineering and Scheduling) team's mission is to deliver scalable, on-time server-to-fabric connectivity, robust capacity management and cross fabric network aware scheduling.
Responsibilities
Develop and maintain critical software services that orchestrate the network for Google's next-generation hardware and AI/ML platforms.
Contribute to the design and implementation of scalable network models for machines, racks, and ToR-to-fabric topologies using technologies like Unified Network Model (UNM) and Model X.
Build automation to streamline network capacity provisioning and enhance resource efficiency across Google's data centers.
Enhance the reliability and operational excellence of the ToR infrastructure, ensuring seamless network performance for all Google services.
Partner with teams across Cloud to deliver end-to-end networking solutions for NPIs enabling AI/ML infrastructure.
Drive or contribute significantly to projects focused on introducing new rack switch hardware, improving the flexibility, efficiency, speed, and reliability of deployments, and boosting network availability.
Write high-quality, testable, and maintainable code, and participate actively in design and code reviews.
Qualifications
Minimum
Bachelor’s degree or equivalent practical experience.
2 years of experience with software development in C++.
Preferred
Master's degree or PhD in Computer Science, or a related technical field.
2 years of experience with distributed computing.
2 years of experience with data structures.
2 years of experience with SQL.
1 years of experience with data center networking.