Neuron Runtime Software Development Engineer , Neuron Runtime

About the job

As the Software Development Engineer for the Neuron Runtime Team, you will be responsible for working alongside a team of engineers to develop and maintain high-performance runtime libraries and drivers for machine learning applications and AI accelerators. You will work on design, development, and deployment of Neuron Runtime and other Neuron components. The profiler plays a crucial role to internal and external customers in optimizing AI workloads across hardware platforms such as Trainium and Inferentia devices, by providing deep insights into performance bottlenecks and system behavior. Improving performance of ML Kernels and ML Frameworks.

Responsibilities

manage the full development life cycle of the Neuron Runtime, ensuring scalability, reliability, and usability

collaborate with cross-functional teams to ensure that the our C++ compiler generates key information so customers can understand and optimize the performance of our custom hardware

drive innovations that allow the profiler to support multiple frameworks, such as PyTorch, JAX, and XLA

Qualifications

Minimum

experience in architecting, building, and operating distributed systems with a focus on high availability and fault tolerance

Hands-on experience with AWS services (e.g., EC2, ECS, CloudWatch, S3, Lambda) in production environments

track record in Owning services end-to-end including deployment, monitoring, alarming, on-call, and post-incident review

Preferred

3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience

Bachelor's degree in computer science or equivalent