Senior/Staff Applied ML Engineer – AI/ML Evaluation & Simulation

About the job

We're building the next generation of AI evaluation systems — and we're looking for a hands-on engineer who can bridge ML, software, and product to make AI systems more measurable, testable, and trustworthy. We’re part of the AI/ML Evaluation organization, seeking a Senior or Staff-level Applied ML Engineer with strong software engineering skills and a solid understanding of machine learning. In this hands-on role, you’ll help design and build intelligent systems that simulate complex interactions (including agentic workflows powered by LLMs), develop tools for extracting structured insights, and create robust evaluation datasets. You’ll also contribute to building scalable platforms for simulation and behavior analysis. This role sits at the intersection of ML, engineering, and product — ideal for someone passionate about bringing clarity and rigor to real-world AI performance.

Responsibilities

Design and implement systems that simulate user-like interactions and workflows

Build tools and infrastructure to generate, manage, and analyze evaluation data

Develop scalable pipelines to extract structured insights from simulation outputs

Collaborate with scientists and engineers to instrument and assess model performance

Engineer reusable, testable components for experimentation and evaluation workflows

Help define and operationalize success metrics aligned with product and research goals

Qualifications

Minimum

8+ years of experience in software engineering, ML engineering, or applied ML roles

Proficiency in Python or another modern programming language (e.g., Java, Go, Swift)

Experience building and maintaining production-grade systems

Solid understanding of machine learning concepts, especially LLMs and their applications

Excellent communication and collaboration skills with cross-functional partners

Preferred

Experience working on AI evaluation systems, LLM-based simulations, or agentic AI frameworks

Background in building tools for data analysis, model evaluation, or synthetic data generation

Familiarity with metrics instrumentation and observability in ML systems

Experience designing pipelines for AI/ML workflows

Exposure to applied research, generative models, or real-time systems

Understanding of how model quality connects to product outcomes and user experience