AI Performance Library Architect

Intel
US, Oregon, Hillsboro / US, California, Folsom2026-03-27Full time

About the job

Software and AI (SAI) organization is looking for a software development engineer to work on oneDNN project ( https://github.com/uxlfoundation/oneDNN ). oneDNN is a complex cross-platform open-source software project focusing on neural network performance. oneDNN is a critical and highly visible component of Intel AI strategy, powering key AI applications including OpenVINO, Tensorflow, Pytorch, ONNX Runtime, and more. In this role, you will be responsible for design, development, and maintenance of new functionality in oneDNN to enable performance critical portions of AI workloads. In this role you will be supporting software developers optimizing AI frameworks and workloads for Intel CPUs and GPUs, as well as cross-platform ecosystem of AI software developers contributing to oneDNN.

Responsibilities

design, development, and maintenance of new functionality in oneDNN to enable performance critical portions of AI workloads. supporting software developers optimizing AI frameworks and workloads for Intel CPUs and GPUs, as well as cross-platform ecosystem of AI software developers contributing to oneDNN

Qualifications

Minimum

Master’s degree in Mathematics, Physics, Computer Science, or a relevant STEM field. OR Ph.D. degree in Mathematics, Physics, Computer Science, or a relevant STEM field. 5+ years of experience in the following areas: C and C++ Maintaining or contributing to open-source software projects Software libraries design and architecture Implementation of linear algebra algorithms (functions from BLAS, LAPACK, or PyTorch) Performance engineering and software performance optimizations Floating point arithmetic and numerical stability Software development on Linux Low-level performance optimizations using CUDA, x86 assembly or intrinsics, or OpenCL

Preferred

3 years+ Machine learning and deep learning algorithms or High-performance computing (HPC) applications development 3 year+ Floating point implementations of transcendental functions (sin, cos, tanh, elu, etc) 1 year+ Algorithms for non-IEEE low precision data types (bfloat16, fp8, fp4) 1 year+ AI assisted software development