Software Dev Engineer, EC2 Nitro

Amazon
Seattle, WA, USA2026-03-13ONSITE

About the job

Join the EC2 Nitro Machine Learning Systems team to revolutionize accelerated computing in the cloud. We're seeking an exceptional Software Development Engineer to build and optimize the performance measurement infrastructure for some of the most computationally intensive AI/ML workloads on AWS. In this role, you'll establish EC2 as the definitive source for best-known-configurations across diverse ML applications including LLMs, multimodal models, and video generation workloads. Your expertise will directly influence future platform designs by translating performance insights from state of the art research and customer workloads into technical requirements for upcoming accelerated platform launches.

Responsibilities

Design and build foundational infrastructure for ML performance measurement that scales with business demand and operates as reliable CI/CD systems, ensuring high-quality implementations that balance customer requirements with operational excellence

Develop comprehensive regression test coverage across all major component releases including frameworks, firmware, drivers, and networking technologies to maintain optimal platform performance

Collaborate with cross-functional teams to establish EC2 as the definitive source for best-known-configurations across diverse ML applications including LLMs, multimodal models, and MoE architectures

Document and communicate performance insights to influence future platform designs by translating technical findings from research and customer workloads into actionable recommendations

Identify and resolve complex performance challenges through systematic analysis of training and inference performance KPIs across accelerated platforms, working directly with customers to improve their ML system efficiency

Qualifications

Minimum

3+ years of non-internship professional software development experience

2+ years of non-intternship design or architecture (design patterns, reliability and scaling) of new and existing systems experience

Experience programming with at least one software programming language

Knowledge of Machine Learning and LLM fundamentals, including transformer architecture, training/inference lifecycles, and optimization techniques

Preferred

3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience

Bachelor's degree in computer science or equivalent

Knowledge of ML frameworks including JAX, PyTorch, vLLM, SGLang, Dynamo, TorchXLA, and TensorRT

Knowledge of machine learning model architecture and inference