About the job
NVIDIA is at the center for the AI revolution that's transforming how people live, work, and interact with technology. Come join us to build high-performance, production-grade software that's at the core of next-generation AI systems.
Responsibilities
Crafting and implementing compiler optimization techniques for deep learning network graphs.
Designing novel graph partitioning and tensor sharding techniques for distributed training and inference.
Performance tuning and analysis.
Code-generation for NVIDIA GPU backends using open-source compilers such as MLIR, LLVM and OpenAI Triton.
Designing user facing features in JAX and related libraries and other general software engineering work.
Working closely with GPU hardware engineering teams to design AI compiler software features for next-generation GPUs.
Qualifications
Minimum
Bachelors, Masters or Ph.D. in Computer Science, Computer Engineering, related field (or equivalent experience).
4+ years of relevant work or research experience in performance analysis and compiler optimizations.
Ability to work independently, define project goals and scope, and lead your own development effort adopting clean software engineering and testing practices.
Excellent C/C++ programming and software design skills, including debugging, performance analysis, and test design.
Strong foundation in architecture of CPU, GPUs or other high performance hardware accelerators. Knowledge of high-performance computing and distributed programming.
Preferred
CUDA or OpenCL programming experience is desired but not required.
Experience working deep learning frameworks such as JAX, PyTorch or TensorFlow.
Extensive experience with CUDA or with GPUs in general.
Experience with open-source compilers such as XLA, LLVM, MLIR or TVM.
Strong interpersonal skills are required along with the ability to work in a dynamic product-oriented team. A history of mentoring junior engineers and interns is a bonus.