Senior DL Compiler Engineer- CUDA Tile

Nvidia
US, CA, Santa Clara / US, TX, Austin / US, TX, Remote2026-03-11remote_local

About the job

NVIDIA's invention of the GPU 1999 sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world. Today, we are increasingly known as “the AI computing company”. We are hiring software engineers for the CUDA Tile team. NVIDIA GPUs are at the center of the deep learning revolution and continue to enable breakthroughs in generative AI, large language models, recommendation systems, speech recognition, image classification and other areas. Come join us to work with a top-notch team and have broad impact across the entire deep learning community.

Responsibilities

design and implement compiler transformations

develop MLIR-based dialects and lowering passes

optimize the performance of tile-based kernels to ensure they execute efficiently across multiple generations of NVIDIA GPU architectures

define public APIs

crafting and implementing compiler and optimization techniques

performance optimization

other general software engineering work

Qualifications

Minimum

Bachelors, Masters or Ph.D. in Computer Science, Computer Engineering or a related field (or equivalent experience)

3+ years of relevant work or research experience in compiler optimization, performance analysis and IR design.

Ability to work independently, define project goals and scope, and lead your own development effort.

Excellent C/C++ programming and software design skills, including debugging, performance analysis, and test design.

Strong interpersonal skills are required along with the ability to work in a dynamic product-oriented team.

Preferred

Knowledge of CPU and/or GPU architecture. CUDA or OpenCL programming experience.

Experience with the following technologies: MLIR, LLVM, XLA, TVM and deep learning models and algorithms.