About the job
We are now looking for a Senior Kernel Performance Architect for Deep Learning Software! NVIDIA is seeking extraordinary architects to develop processor and system architectures that accelerate machine learning, data analytics and high-performance computing applications. This position offers the chance to create a meaningful impact in a dynamic, technology-focused company.
Responsibilities
Craft GPU-accelerated system architectures that push the boundaries of deep learning performance.
Prototype high-performance software for deep learning and data analytics workloads.
Analyze, visualize, and optimize software performance using analytical models, simulators, and test suites.
Collaborate closely across NVIDIA teams such as:
CUDA Compiler teams to identify performance issues.
AI/ML training and inference performance teams to identify and optimize critical deep learning layers.
hardware architecture performance teams to define expectation for emerging deep learning hardware features.
Qualifications
Minimum
A Master's or PhD in Computer Science, Electrical Engineering or Computer Engineering, or equivalent experience.
5+ years of relevant industry or research experience.
A strong foundation in machine learning and deep learning fundamentals to complement your expertise in computer architecture.
A strong background in high performance kernel (such as CUTLASS), work experience on math library performance analysis and profiling to identify performance bottlenecks.
Fluency in programming languages such as Python, C, C++.
Experience and familiarity with GPU computing and parallel programming models.
You have firsthand work experience with analytical performance modeling, profiling, and analysis.
Preferred
No preferred qualifications listed.