AI Test and Debug Engineer

AMD
Secaucus, New Jersey, United States2026-03-05LAT_LNG

About the job

We are seeking a collaborative and motivated Test Engineer with several years of experience wiring test-content/scripts to join our team. In this role, you will develop and execute methodologies and test content for system-level hardware, firmware and software validation on machine learning systems built with AMD’s technologies. You will be responsible for debugging and participating in cross-domain triage activities. You’ll work closely with architecture, design, and post-silicon teams to identify and resolve complex systems, improving future debug and test features. This position involves first and foremost developing test content (integrating new tools, creating new scripts, etc.), debugging issues, refining validation processes, executing test cases, analyzing data and driving system-level quality across AMD’s product portfolio. We welcome applicants from diverse backgrounds who bring curiosity, technical depth, and a commitment to innovation.

Responsibilities

Develop complex/critical test content as well as monitoring/debug/root-cause SW mechanisms that others can use in their plans. Be part of the team to debug issues on the platform. Be hands-on with the HW/FW/SW and able to craft test/scripts/experiments to root cause complex issues that can appear at the cluster, rack, server, or GPU level. Create and execute test procedures to validate the design of various system components on the next generation of machine learning architectures. This includes all components of servers including CPU/GPU/Memory/BIOS/BMC/IO/storage/networking, etc. Translating system specs into a robust system integration test plan Make improvements to system level integration test strategies, methodologies, and processes Develop and improve automation features according to requirements Monitor and analyze the execution of automated tests on at scale (for hundreds or thousands of systems) Take an active role in lab management activities, such as inventory control, lab compliance, and hardware installation/reconfiguration/rework

Qualifications

Minimum

No minimum qualifications listed.

Preferred

Prior experience working on servers, notably HCP or ML platforms. Familiarity with GPU systems, scale-up and scale-out architecture are a plus. Several years of experience writing software in languages such as Python to aid on automation or debug is a must. Post-silicon system integration and system testing as well as proven debug experience is a must. Ability to validate at server, rack and cluster-level (scale-up and scale-out) Experience with Computer Architecture concepts and silicon features Experience with installation and management of operating systems. Fundamental knowledge of virtual machines and hyper-visors Effective communication skills including influencing and working across large multi-functional HW, SW, architecture teams