Neuron Collectives Software Engineer, Trainium Collectives

Amazon
Cupertino, CA, USA2026-02-16ONSITE

About the job

As a Neuron Collectives Software Developer, you will work on enhancing collective algorithms and topologies for optimal training performance, using tools like Neuron Explorer to identify bottlenecks, and optimizing collective operations to scale AI compute across the data center. You will also develop and optimize C/C++ implementations of collective communication patterns, and build and maintain analysis frameworks and automation solutions.

Responsibilities

Enhance collective algorithms and topologies for optimal training performance

Use tools like Neuron Explorer to identify bottlenecks in compute and bus bandwidth utilization

Monitor and analyze processor, DMA, firmware, and workload metrics

Optimize collective operations to scale AI compute across the data center

Work closely with the hardware team to co-optimize software and Trainium silicon

Develop and optimize C/C++ implementations of collective communication patterns

Investigate and implement improvements for specific training topologies used by modern LLMs

Build and maintain analysis frameworks and automation solutions

Qualifications

Minimum

3+ years of non-internship professional software development experience

2+ years of non-intternship design or architecture (design patterns, reliability and scaling) of new and existing systems experience

Experience programming with at least one software programming language

Preferred

3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience

Bachelor's degree in computer science or equivalent