Software Development Engineer, ML Systems, Annapurna Labs

About the job

You will join a dynamic team working at the cutting edge of the GenAI revolution by applying AI to AI. You will work on building agents, tools, and models to simplify and accelerate customer adoption of Neuron, the software stack supporting Amazon's Machine Learning silicon: Trainium. Partnering with external and internal customers, you will identify key obstacles and opportunities to accelerate their migration to AWS's ML silicon. You will be a key contributor driving impact by building AI agents and tools that simplify AWS Neuron adoption, which is critical to AWS's Generative AI business.

Responsibilities

Research implementations that deliver the best possible experiences for customers.

Deliver on goals to improve the time and effort it takes to port and optimize Machine Learning workloads on Neuron.

Solve challenging technical problems, often ones not solved before, at every layer of the stack

Design, implement, test, deploy and maintain innovative software solutions to transform service performance, durability, cost, and security.

Build high-quality, highly available, always-on products.

Potentially contribute intellectual property through patents

Qualifications

Minimum

3+ years of non-internship professional software development experience

2+ years of non-intternship design or architecture (design patterns, reliability and scaling) of new and existing systems experience

Experience programming with at least one software programming language

Experience working with Data & AI related technologies, including, but not limited to, AI/ML, GenAI, Analytics, Database, and/or Storage

Experience working with customers with a passion for delivering exceptional service, or experience that includes strong analytical skills, attention to detail, and effective communication abilities and experience in software development

Computer Science core: object-oriented design, data structures, and performance analysis with at least 2 programming languages.

Experience in one or more of the following areas: ML compilers, production coding agents, GenAI model architecture, model training, neural network optimization, or alternatively applied math.

Preferred

3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience

2+ years in machine learning or other computational modeling environments with an emphasis on hosting, building or optimizing models for diverse hardware platforms

Proven track record in building AI agents that automate ML workload optimization, ML compiler tuning, distributed inference and training, or ML kernel authoring and optimization

Experience working with open-source software communities in the optimization space or related areas

Knowledge of the state-of-the-art technology used in the Machine Learning space and its mathematical underpinning