About the job
AMD AI Group is seeking a highly influential technical leader for OneROCm — driving a unified ROCm software stack across AMD’s broad product portfolio, including Instinct, Radeon, Ryzen, Embedded, Game Consoles, and Autonomous Driving platforms. This is a rare opportunity to drive innovation, and help develop next-generation products (NPI) at company-wide scale. The ideal candidate will shape the end-to-end ROCm software and influence the full stack, spanning compilers, kernels, runtime, libraries, models, frameworks, and performance optimization layers. The role also requires strong hardware/software co-design to maximize performance across diverse AMD products and workloads. The ideal candidate is expected to be highly hands-on and embrace agentic AI workflows.
Responsibilities
Hardware-Software Co-design: Collaborate across hardware architecture, compiler, math libraries, kernel and framework teams to influence future silicon features based on evolving AI workload trends.
Strong Execution: Deliver innovations and roadmap for AI software stack across all AMD products, ensuring AMD remains the platform of choice for top-tier AI customers.
Workload Performance Engineering: Lead the profiling, analysis, and tuning of large-scale models (LLMs, Diffusion, Multimodal, and MoE) to ensure "out-of-the-box" performance excellence on AMD hardware.
Ecosystem Innovation: Drive the development of advanced tools and frameworks for performance estimation, modeling, and automated reporting.
Customer Engagement: Partner with top customers and hyperscalers to understand their unique workload requirements and deliver tailored architectural wins and software optimizations.
Community & Open Source: Mentor and inspire other engineers and contribute to ROCm Opensource.
Qualifications
Minimum
No minimum qualifications listed.
Preferred
Knowledge in GPU architectures, basic knowledge of CPU architecture
Experience in AI/ML software stack spanning compilers, kernels, runtime, libraries, models, frameworks, and performance optimization layers
Understanding of GPU programming such as ROCm, CUDA, OpenCL, etc
Experience in hardware/software co-design, building high-performance products across the full product lifecycle.
Experience with operating systems (OS) and device driver development is a plus