About the job
Do you want to build the backbone of Generative AI cloud at AWS? Do you want to build the future of the cloud for AI training and inference? Want to do industry leading work delivering continuous price performance improvements in the cloud for AI model training for multi billion variable LLMs? Come Join us in designing, delivering and operating AWS cloud offerings that enable high performance and scalability in AI/ML and HPC workloads.
Responsibilities
You will be a technical leader solving complex architectural problems which may not defined before hand. You will be owning the teams systems and work proactively in identifying deficiencies, writing tactical code to solve issues before they impact customers, and working with your team to scale the solution. You will decompose big difficult server system testability, reliability and diagnosis problems into straightforward tasks, components or features that you will lead to deliver yourself and through others in parallel. You will use combination of hardware, software, system designs, x86 architecture, processes, diagnosis and operations knowledge.
Qualifications
Minimum
3+ years of programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby experience
4+ years of non-internship professional software development experience
2+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience
4+ years of systems development in an IT or data center environment experience
4+ years of deploying and operating in a Linux/Unix environment experience
2+ years of systems design, software development, operations, automation, and process improvement experience
Experience leading the design, build and deployment of complex and performant (reliable and scalable) software solutions in production
Preferred
You are knowledgeable of the full technical stack - vertically from baremetal server hardware up to the software in userland, and everything in the middle. You have tremendous interest in cloud scale and curious how systems and software decisions impact the user. You insist on highest-standards and are able to develop tactical solutions/tools to diagnose and fix issues. You are an excellent systems debugger - finding interaction issues between components on server systems. You are a leader with strong organizational, planning, and communication skills. You are a builder!