Senior Deep Learning Kernel Software Performance Architect

Nvidia
US, CA, Santa Clara2026-01-13onsite

About the job

We are now looking for a Senior Kernel Performance Architect for Deep Learning Software! NVIDIA is seeking extraordinary architects to develop processor and system architectures that accelerate machine learning, data analytics and high-performance computing applications. This position offers the chance to create a meaningful impact in a dynamic, technology-focused company.

Responsibilities

Craft GPU-accelerated system architectures that push the boundaries of deep learning performance.

Prototype high-performance software for deep learning and data analytics workloads.

Analyze, visualize, and optimize software performance using analytical models, simulators, and test suites.

Collaborate closely across NVIDIA teams such as:

CUDA Compiler teams to identify performance issues.

AI/ML training and inference performance teams to identify and optimize critical deep learning layers.

hardware architecture performance teams to define expectation for emerging deep learning hardware features.

Qualifications

Minimum

A Master's or PhD in Computer Science, Electrical Engineering or Computer Engineering, or equivalent experience.

5+ years of relevant industry or research experience.

A strong foundation in machine learning and deep learning fundamentals to complement your expertise in computer architecture.

A strong background in high performance kernel (such as CUTLASS), work experience on math library performance analysis and profiling to identify performance bottlenecks.

Fluency in programming languages such as Python, C, C++.

Experience and familiarity with GPU computing and parallel programming models.

You have firsthand work experience with analytical performance modeling, profiling, and analysis.

Preferred

No preferred qualifications listed.