Software Development Engineer II, AI/ML Elastic Collectives - Annapurna Labs

Amazon
Cupertino, California, USA2026-01-29ONSITE

About the job

We are seeking an experienced engineer to work on distributed AI/ML systems. This role involves working on collective operations - the fundamental operations that enable AI to scale across multiple accelerators & servers. Most of our stack is C/C++ and relatively low level, so solid knowledge of Linux, kernels, and performant code is important. Experience with embedded systems is valued, and experience with high-speed networking or HPC interconnects is valued highly.

Responsibilities

Work on distributed AI/ML systems; develop collective operations enabling AI to scale across multiple accelerators and servers; build networking solutions for Machine Learning (ML) and High-Performance Computing (HPC) workloads on AWS; collaborate with infrastructure experts, hardware engineers, RTL engineers, scientists, and architects.

Qualifications

Minimum

3+ years of non-internship professional software development experience

2+ years of non-intternship design or architecture (design patterns, reliability and scaling) of new and existing systems experience

Experience programming with at least one software programming language

Knowledge of Linux fundamentals

Preferred

Experience with embedded systems is valued, and experience with high-speed networking or HPC interconnects is valued highly.