About the job
Do you want to build the backbone of Generative AI cloud at AWS? Do you want to build the future of the cloud for AI training and inference? Want to do industry leading work delivering continuous price performance improvements in the cloud for AI model training for multi billion variable LLMs? Come Join us in designing, delivering and operating AWS cloud offerings that enable high performance and scalability in AI/ML and HPC workloads.
Responsibilities
You will be a technical leader solving complex architectural problems which may not defined before hand. You will be owning the teams systems and work proactively in identifying deficiencies, writing tactical code to solve issues before they impact customers, and working with your team to scale the solution. You will decompose big difficult server system testability, reliability and diagnosis problems into straightforward tasks, components or features that you will lead to deliver yourself and through others in parallel. You will use combination of hardware, software, system designs, x86 architecture, processes, diagnosis and operations knowledge. In this role you will create automation through agentic workflows. You’ll develop smart automation solutions, implement AI-driven tools and workflows and be part of AI transformation.
Qualifications
Minimum
2+ years of non-internship professional software development experience; 1+ years of designing or architecting (design patterns, reliability and scaling) of new and existing systems experience; 7+ years of administrative experience in networking, storage systems, operating systems and hands-on systems engineering experience; Knowledge of systems engineering fundamentals (networking, storage, operating systems); Experience programming with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby
Preferred
Experience with PowerShell (preferred), Python, Ruby, or Java; Experience working in an Agile environment using the Scrum methodology