About the job
Join our Silicon Validation team to validate next-generation machine learning accelerators that power AWS's cloud computing infrastructure. You'll work in a fast-paced, startup-like environment alongside some of the brightest minds in the industry on cutting-edge, internet-scale technology that directly impacts how customers use Machine Learning acceleration. We are changing the landscape of cloud infrastructure by accelerating the development of custom silicon by moving beyond traditional partnerships to dominate in AI training and inference.
Responsibilities
Developing comprehensive validation strategies and detailed test plans covering functional, performance, power, and stress testing from silicon bring-up to product release
Executing complex test plans from RTL simulation and emulation environments through physical silicon validation
Conducting hands-on silicon bring-up and debug in the lab using oscilloscopes, logic analyzers, and protocol analyzers
Validating ML accelerator performance, accuracy, and reliability using real-world neural network workloads
Building test infrastructure, CI/CD, and automated regression frameworks to enable efficient validation at scale
Collaborating across architecture, design, firmware, and software teams to triage failures and drive root cause analysis to closure
Reviewing test results, identifying patterns, and providing feedback to improve design quality and validation coverage
Supporting production systems in AWS data centers and addressing field issues as they arise
Qualifications
Minimum
Strong programming skills (Python, Lua, C/C++, Rust, Go, etc)
A solid understanding of computer architecture
Experience with AWS services, cloud infrastructure, firmware development (BIOS, BMC, drivers)
Validation experience in any of these areas: PCIe, HBM, GPUs, neural networks, ML HW architecture, and/or CI/CD
Familiarity with the validation lifecycle from RTL simulation (SystemVerilog/UVM, VCS, Questa, Xcelium) and emulation (Palladium, Zebu, Veloce) through silicon failure analysis and debug
Preferred
Experience with Machine Learning Hardware/Software Architecture
Experience with CI/CD
Experience with EDA Simulations or Emulation