Software Development Engineer, Neuron Collectives, Annapurna Labs

Amazon
Cupertino, CA, USA2026-04-29ONSITE

About the job

The AWS Neuron Collectives team is seeking a Software Engineer to optimize collective operations for AWS Trainium. Trainium is one of Amazon's highest priority initiatives, powering the frontier AI models being trained today. Collectives are the critical operations that scale AI compute across the data center. You'll work in depth to optimize compute for the specific topologies used to train modern LLMs. Working closely with the hardware team, you'll push for maximum performance using C/C++, interfacing with DMA and firmware and investigating detailed topologies. You'll analyze current collective algorithms using publicly accessible tools like Neuron Explorer and optimize these to fully utilize compute and bus bandwidth to scale across the data center. This is a unique opportunity to impact how AI training runs at AWS scale, while growing your technical breadth and depth.

Responsibilities

Enhance collective algorithms and topologies for optimal training performance

Use tools like Neuron Explorer to identify bottlenecks in compute and bus bandwidth utilization

Monitor and analyze processor, DMA, firmware, and workload metrics

Optimize collective operations to scale AI compute across the data center

Work closely with the hardware team to co-optimize software and Trainium silicon

Develop and optimize C/C++ implementations of collective communication patterns

Investigate and implement improvements for specific training topologies used by modern LLMs

Build and maintain analysis frameworks and automation solutions

Qualifications

Minimum

Experience building complex software systems that have been successfully delivered to customers

Experience contributing to the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems

Bachelor's degree in computer science or equivalent

Knowledge of engineering practices and patterns for the full software/hardware/networks development life cycle, including coding standards, code reviews, source control management, build processes, testing, certification, and livesite operations

Experience in development in the last 3 years, or experience in embedded development in C/C++

Preferred

Master's degree in computer science or equivalent

Experience with hardware/software integration and real-time systems

Familiarity with collective communication algorithms (e.g., all-reduce, all-gather) or distributed training frameworks