Senior AI Software Engineer, Kernel Libraries

Nvidia
US, CA, Santa Clara / Remote - US2026-04-22remote_local

About the job

We're looking for outstanding AI systems engineers to develop groundbreaking technologies in the inference systems software stack! We build innovative AI systems software to accelerate for AI inference. As a member of the team, you'll develop libraries, code generators, and GPU kernel technologies for NVIDIA's hardware architecture. This means designing and building things like new abstractions, efficient attention kernel implementations, new LLM inference runtimes components, and kernel code generators to accelerate large language models, agents, and other high-impact AI workloads.

Responsibilities

Innovating and developing new AI systems technologies for efficient inference

Designing, implementing, and optimizing kernels for high impact AI workloads

Designing and implementing extensible abstractions for LLM serving engines

Building efficient just-in-time domain specific compilers and runtimes

Collaborating closely with other engineers at NVIDIA across deep learning frameworks, libraries, kernels, and GPU arch teams

Contributing to open source communities like FlashInfer, vLLM, and SGLang

Qualifications

Minimum

Masters degree in Computer Science, Electrical Engineering, or related field (or equivalent experience); PhD are preferred

6+ years (academic/ industry) experience with ML/DL systems development preferable

Strong experience in developing or using deep learning frameworks (e.g. PyTorch, JAX, TensorFlow, ONNX, etc) and ideally inference engines and runtimes such as vLLM, SGLang, and MLC.

Strong Python and C/C++ programming skills

Preferred

Background in domain specific compiler and library solutions for LLM inference and training (e.g. FlashInfer, Flash Attention)

Expertise in inference engines like vLLM and SGLang

Expertise in machine learning compilers (e.g. Apache TVM, MLIR)

Strong experience in GPU kernel development and performance optimizations (especially using CUDA C/C++, cuTile, Triton, or similar)

Open source project ownership or contributions