Cloud Platform Software Engineer

Nvidia
US, WA, Seattle / US, CA, Santa Clara / Remote - US2026-04-30remote_local

About the job

Are you passionate about Kubernetes and AI and want to help build the best platform for ML/AI infrastructure? Do you thrive when your work directly empowers teams to push the boundaries of what's possible? We're the Platform API team within NVIDIA's DGX Cloud organization - a collaborative group of cloud platform engineers, architects, and SREs who are passionate about building and nurturing the declarative, Kubernetes-native control plane that powers GPU-accelerated infrastructure across multiple cloud providers. Together, we're empowering the world's leading AI teams to train and deploy at datacenter scale.

Responsibilities

Develop software systems to support large scale deployments of cloud infrastructure

Design and develop APIs to support Infrastructure as Code (IaC) automation and deployment workflows.

Responsible for contributing to multiple source code projects to fulfill NVIDIA requirements with software services

Work and collaborate with engineering managers, architects, designers, and frontend engineers to deliver high quality software

Automate the validation of software solutions with unit and integration tests

Participate in the ownership and health of CI/CD pipelines from dev to production environments

Collaborate with other specialists for feedback on proposed designs and product direction

Openly share successes and failures in a no blame environment

Qualifications

Minimum

BS in Computer Science, Information Systems, Computer Engineering or equivalent experience

5+ years of proven experience in large scale software development

Experience building and shipping services on Kubernetes

Background with using and chipping in to open-source projects

Collaborated with teams to write software to support cloud services at scale

Programming experience in a relevant language, e.g. Golang, Python

Communicate design and quality strategy in written, visual, and oral formats

Experience with a wide range of modern infrastructure tools and technologies

Preferred

Experience with Kubernetes Cluster API, Terraform, Tinkerbell, and other infrastructure tooling

Practical experience with Azure, GCP, or AWS

Capable of refactoring software to run in systems such as Kubernetes

Ability to discuss and work with CSI, CNI, and CRI and/or familiarity with the CNCF and the tooling across the ecosystem

Upstream contribution in open source projects