Sr. Machine Learning - Compiler Engineer III, AWS Neuron, Annapurna Labs

About the job

Do you want to be part of AI revolution? At AWS our vision is to make deep learning pervasive for everyday developers and to democratize access to cutting-edge infrastructure. In order to deliver on that vision, we’ve created innovative software and hardware solutions that make it possible. AWS Neuron is the SDK that optimizes the performance of complex ML models executed on AWS Inferentia and Trainium, our custom chips designed to accelerate deep-learning workloads. This role is for a senior software engineer in the Compiler team for AWS Neuron. As part of this role, you will be responsible for building next generation Neuron compiler which transforms ML models written in ML frameworks (e.g, PyTorch, TensorFlow, and JAX) to be deployed AWS Inferentia and Trainium based servers in the Amazon cloud. You will be responsible for solving hard compiler optimization problems to achieve optimum performance for variety of ML model families including massive scale large language models like Llama, Deepseek, and beyond as well as stable diffusion, vision transformers and multi-model models. You will be required to understand how these models work inside-out to make informed decisions on how to best coax the compiler to generate optimal implementation instruction. You will leverage your technical communications skill to partner with other teams and will be involved in pre-silicon design, bringing new products/features to market, and many other exciting projects. Experience in object-oriented languages like C++/Java is a must, experience with compilers or building ML models using ML frameworks on accelerators (e.g., GPUs) is preferred but not required. Experience with technologies like OpenXLA, StableHLO, MLIR will be added bonus!

Responsibilities

You will design, implement, test, deploy and maintain innovative software solutions to transform Neuron compiler’s performance, stability and user-interface. You will work side by side with chip architects, runtime/OS engineers, scientists and ML Apps teams to seamlessly deploy cutting edge ML models from our customers on AWS accelerators with optimal cost/performance benefits. You will have opportunity to become front-face of Neuron Compiler to work with open-source communities (e.g., StableHLO, OpenXLA, MLIR) and influence industry wide partners to pioneer optimizing cutting-edge ML workloads on AWS software and hardware. You will also work on building innovative features that will deliver best possible experiences for our customers – developers across the globe.

Qualifications

Minimum

5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience

2+ years of experience in developing compiler features and optimizations

Proficiency with 1 or more of the following programming languages: C++ (preferred), C, Python

Preferred

Master or PhD degree in computer science or equivalent

Proficiency with resource management, scheduling, code generation, and compute graph optimization

Experience optimizing Tensorflow, PyTorch or JAX deep learning models

Experience with multiple toolchains and Instruction Set Architectures