Deep Learning Kernel Software Performance Architect - New College Grad 2026

Nvidia
US, CA, Santa Clara2026-04-16onsite

About the job

NVIDIA is seeking extraordinary architects to develop processor and system architectures that accelerate machine learning, data analytics and high-performance computing applications. This position offers the chance to create a relevant impact in a dynamic, technology-focused company.

Responsibilities

Validate and analyze performance of GPU-accelerated system and software architectures that advance the frontier of deep learning performance.

Debug deep learning and data analytics software to identify root causes of performance bottlenecks.

Develop scripts and tools to analyze, visualize, and debug software using analytical models, simulators, and test suites

Work with the CUDA and AI Compiler teams to pinpoint and resolve performance issues

Engage AI/ML training and inference performance teams to identify and optimize critical deep learning layers

Collaborate with hardware architecture performance teams to define expectations for emerging deep learning hardware features

Qualifications

Minimum

Master's or PhD in Computer Science, Electrical Engineering or Computer Engineering, or equivalent experience.

Proven expertise in software design, including debugging, performance analysis, and test development

Hands-on experience with practical parallel programming, even if it’s not on GPUs.

Strong understanding of computer architecture, with practical experience on performance debugging.

Ability to identify bottlenecks, optimize resource utilization, and enhance system throughput

Fluency in programming languages such as Python, C, C++.

Preferred

Strong foundation in machine learning and deep learning fundamentals to complement your expertise in computer architecture.

A strong background in high performance power efficient designs, energy efficient high-performance computing, performance analysis and profiling to identify performance bottlenecks.

Experience and familiarity with GPU computing and parallel programming models.

Work experience with analytical performance modeling, profiling, and analysis