Senior High Performance AI Engineer

About the job

We are looking for outstanding Senior High Performance AI Engineer to build groundbreaking multi-agent systems for the CUDA ecosystem. We build innovative agentic runtimes and compiler-integrated orchestration that work together with NVIDIA's software stack to provide comprehensive acceleration for modern agent workloads powered by foundational models. As a member of the team, you will develop new agent abstractions, GPU-centric runtimes, and compiler- or runtime-driven system solutions to accelerate agent planning, tool-use, code generation, and other high-impact AI workloads. You will collaborate closely with internal NVIDIA software and hardware teams to push the latest developments into NVIDIA products.

Responsibilities

Design, build and optimize agentic AI systems for the CUDA ecosystem.

Co-design agentic system solutions with software, hardware and algorithm teams; influence and adopt new capabilities as they become available.

Develop reproducible, high-fidelity evaluation frameworks covering performance, quality and developer productivity.

Collaborate across the AI stack—from hardware through compilers/toolchains, kernels/libraries, frameworks, distributed training, and inference/serving—and with model/agent teams.

Qualifications

Minimum

Bachelor’s degree in Computer Science, Electrical Engineering, or related field (or equivalent experience); MS or PhD preferred.

6 years+ industry or academia experience with AI systems development; exposure to building foundational models, agents or orchestration frameworks; hands-on experience with deep learning frameworks and modern inference stacks.

Strong C/C++ and Python programming skills; solid software engineering fundamentals.

Experience with GPU programming and performance optimization (CUDA or equivalent).

Preferred

Track record building/evaluating deep learning models, coding agents and developer tooling.

Demonstrated ability to optimize and deploy high-performance models, including on resource-constrained platforms.

Deep expertise in GPU performance optimizations, evidenced by benchmark wins or published results.

Publications or open-source leadership in deep learning, multi-agent systems, reinforcement learning, or AI systems; contributions to widely used repos or standards.