GPU Software Architecture Engineer, Graphics, Games, & ML

About the job

Apple Silicon GPU SW architecture team within the Media, Graphics & Compute Technologies group is seeking a senior/principal engineer to lead server-side ML acceleration and multi-node distribution initiatives. You will help define and shape our future GPU compute infrastructure on Private Cloud Compute that enables Apple Intelligence.

Responsibilities

Design and implement tensor/data/expert parallelism strategies for large language model inference across distributed server cluster environments

Drive hardware and software roadmap decisions for ML acceleration

Expert in designing architectures that achieves peak compute utilizations and optimal memory throughput

Develop and optimize distributed inference systems with focus on latency, throughput, and resource efficiency across multiple nodes

Architect scalable ML serving infrastructure supporting dynamic model sharding, load balancing, and fault tolerance

Collaborate with hardware teams on next-generation accelerator requirements and software teams on framework integration

Lead performance analysis and optimization of ML workloads, identifying bottlenecks in compute, memory, and network subsystems

Drive adoption of advanced parallelization techniques including pipeline parallelism, expert parallelism, and various other emerging approaches

Qualifications

Minimum

10+ years of experience in GPU programming (CUDA, ROCm) and high-performance computing, successfully optimizing large-scale parallel workloads.

Strong experience with inter-node communication technologies (InfiniBand, RDMA, NCCL) in the context of ML training/inference

Must have excellent system programming skills in C/C+

Deep understanding of distributed systems and parallel computing architectures

Understand how tensor frameworks (PyTorch, JAX, TensorFlow) are used in distributed training/inference

Bachelor's degree in Computer Science, Engineering, Mathematics, or a related technical field

Preferred

Familiar with model development lifecycle from trained model to large scale production inference deployment

Proven track record in ML infrastructure at scale

Python is a plus

PhD in Computer Science, Engineering, Mathematics, or a related technical field