About the job
NVIDIA is looking for senior engineers who are obsessed with performance analysis and optimization to help us squeeze every last clock cycle out of AI training, the workload driving the design and construction of the largest and most powerful compute systems in the world. If you are willing to work across all layers of the hardware/software stack - from GPU architecture to the application code - to achieve peak performance, we want to hear from you. This role offers the opportunity to directly impact the hardware and software roadmap in a fast-growing technology company that leads the AI revolution. Join us and help design and build the world's most powerful compute systems!
Responsibilities
Understand, analyze, profile, and optimize AI training workloads on new hardware and software platforms, identifying fundamental performance limiters.
Prioritize and solve performance issues across the key AI model training tasks, with the goal of pushing the end-to-end performance towards the physical limits.
Implement production-quality software across multiple layers of NVIDIA's deep learning platform stack, from drivers to DL frameworks.
Build and support NVIDIA submissions for MLPerf Training benchmarks.
Implement key DL training workloads in NVIDIA's proprietary processor and system simulators to enable future architecture studies.
Develop tools to automate workload analysis, optimization, and other critical workflows.
Qualifications
Minimum
PhD in CS, EE or CSEE (or equivalent experience) with 5+ years of relevant experience; or MS with 8+ years of experience.
Strong background in deep learning and neural networks, particularly in training.
Solid understanding of computer architecture and familiarity with GPU fundamentals.
Proven background in analyzing and tuning application performance.
Proven experience with processor and system-level performance modeling.
Proficiency in programming with C++, Python, and CUDA.
Preferred
No preferred qualifications listed.