Senior Staff Software Architect, Accelerator Platforms

Google
Sunnyvale, CA, USA

About the job

As a Staff Software Architect, you will architect and drive the software innovations that power Google's AI and HPC infrastructure. Your focus will be on the software stack above the firmware, including distributed systems, Linux OS and networking, power management, and seamless integration with hardware through buses like PCIe. You will lead the technical goal and execution, enabling massive-scale deployment of Accelerators (e.g., GPUs, TPUs, etc.) for critical Google services and Cloud. Your work is fundamental to unlocking new frontiers in AI.

Responsibilities

Serve as the Tech Lead (TL), defining the architecture and technical road map for the software stack on our accelerator platforms.

Drive large-scale technical programs from concept to deployment, ensuring cross-team alignment and on-time delivery of complex systems. This includes interfacing with hardware, software, and SRE teams to deliver scalable solutions for Google's Data Centers.

Be responsible for guiding multiple teams through the successful design, development, and execution of this roadmap.

Focus on distributed systems software, core Linux OS components, Linux Networking, Power Management strategies, and the intricate interactions with hardware buses such as PCIe, USB, and I2C.

Qualifications

Minimum

Bachelor's degree in Computer Science, Electrical Engineering, a related technical field, or equivalent practical experience.

8 years of experience in software development.

5 years of experience in a technical leadership role.

Experience architecting and developing software for distributed systems, and programming in C or C++.

Experience with Linux OS internals, kernel development, or systems programming.

Experience with Linux networking concepts and development (e.g., sockets, TCP/IP, kernel networking stack).

Preferred

Experience with system-level power management techniques.

Experience with software development for accelerators (e.g., GPUs, TPUs) in data center environments.

Experience with low-level platform bring-up and debugging.

Experience technically leading and mentoring a team of Engineers.

Familiarity with industry standardization bodies (e.g., PCI-SIG, Compute Express Link (CXL) Consortium, Distributed Management Task Force (DMTF), Open Compute Project (OCP)).

Knowledge of High-Performance Computing (HPC) systems and networking.