Senior/Staff Engineer : Post Silicon- Bring Up

About the job

In this exciting role, you will be responsible for bring up and optimizations of Cerebras’s Wafer Scale Engine (WSE). Suitable candidate will have experience delivering end to end solutions working closely with teams across chip design, system performance, software development and productization.

Responsibilities

On Wafer Scale Engines, develop and debug flows that embed well tested and deployable optimizations in production processes to reduce time and costs

Work on refining AI Systems across H/W-S/W design constraints such as di/dt, V-F characterization space, current and temperature limits in relation to optimizations for performance.

Develop/Enhance infrastructure to enable silicon for real world workload testing

Develop self-checking metrics, as well as instrumentation for debug and coverage

Work with the silicon architects/designers, performance engineers and software engineers to enhance performance of Wafer Scale Engines.

Work across domains such as, Software, Design, Verification, Emulation & Validation to refine and optimize performance and process.

Work with CI/CD tools, git repositories, github, git actions/Jenkins, merge and release flows to streamline test and release.

Qualifications

Minimum

BS/BE/B.Tech or MS/M.Tech in EE, ECE, CS or equivalent work experience

7-10+ years of industry experience

3-5 years of experience in Pre-silicon & Post Silicon ASIC hardware

Good understanding of computer architecture and networking

Excellent Coding in languages such as Python/Verilog/System Verilog and C

Proficient in hardware/software codesign and layered architectures.

Excellent debugging, analytical, and problem-solving skills

Proficient in large scale testing and automation using pytest and python

Good presentation skills to refine diverse information and put forth optimization strategies and results.

Good interpersonal skills, ability & desire to work as a standout colleague

Proven track record of working cross-functionally learning fast and driving issues to closure

Preferred

Previous work in AI-ML with 100+ CPU core & communication fabric-based design.

Familiarity with in-line testing and diagnostics using CPU memory and execution with self-checking.

Knowledge of chip defect profiles and mitigation strategies across the hardware and software stack

Familiarity in creating test and s/w infrastructure at large scale

Working across global time zones