Software Development Engineer, ML Systems, Annapurna Labs

About the job

You will join a dynamic team working at the cutting edge of the GenAI revolution by applying AI to AI. You will work on building agents, tools, and models to simplify and accelerate customer adoption of Neuron, the software stack supporting Amazon's Machine Learning silicon: Trainium. Partnering with external and internal customers, you will identify key obstacles and opportunities to accelerate their migration to AWS's ML silicon. You will be a key contributor driving impact by building AI agents and tools that simplify AWS Neuron adoption, which is critical to AWS's Generative AI business.

Responsibilities

- Research implementations that deliver the best possible experiences for customers.

- Deliver on goals to improve the time and effort it takes to port and optimize Machine Learning workloads on Neuron.

- Solve challenging technical problems, often ones not solved before, at every layer of the stack

- Design, implement, test, deploy and maintain innovative software solutions to transform service performance, durability, cost, and security.

- Build high-quality, highly available, always-on products.

- Potentially contribute intellectual property through patents

Qualifications

Minimum

- 3+ years of non-internship professional software development experience

- 2+ years of non-intternship design or architecture (design patterns, reliability and scaling) of new and existing systems experience

- Experience programming with at least one software programming language

- Computer Science core: object-oriented design, data structures, and performance analysis with at least 2 programming languages.

- Experience in one or more of the following areas: ML compilers, production coding agents, GenAI model architecture, model training, neural network optimization, or alternatively applied math.

Preferred

- 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience

- 2+ years in machine learning or other computational modeling environments with an emphasis on hosting, building or optimizing models for diverse hardware platforms

- Proven track record in building AI agents that automate ML workload optimization, ML compiler tuning, distributed inference and training, or ML kernel authoring and optimization

- Experience working with open-source software communities in the optimization space or related areas

- Knowledge of the state-of-the-art technology used in the Machine Learning space and its mathematical underpinning