Machine Learning Research Engineer, GenAI Applied ML

About the job

Lead applied ML engineering on Scale's Applied ML team, powering data infrastructure for leading agentic LLMs (ChatGPT, Gemini, Llama). You will build scalable multi-agent systems to validate agentic reasoning and behaviors, scale human expertise, and drive research into real-world agent reliability failures despite strong benchmarks, shipping production fixes.

Responsibilities

Build and deploy multi-agent systems for agentic reasoning validation

Develop pipelines to detect errors and scale human judgment

Combine classical ML, LLMs, and multi-agent techniques for reliability

Lead research into agent failure modes and ship fixes

Use AI tools to speed prototyping and iteration

Build data-driven evaluations and deploy rapid improvements

Integrate systems into Scale's platform

Qualifications

Minimum

PhD or MSc in Computer Science, Mathematics, Statistics, or related field

3+ years shipping scaled production ML systems

Demonstrated real-world impact

Mastery of PyTorch, TensorFlow, JAX, or scikit-learn

Deep expertise in agentic LLMs and multi-agent systems

Strong software engineering and microservices (AWS/GCP)

Rapid, data-driven iteration

Proficiency using AI tools to accelerate work

Strong research depth with practical bias

Excellent cross-functional communication

Preferred

Experience prototyping agent evaluation/reliability systems

Human-in-the-loop or annotation pipeline work

Open-source contributions in agents, evaluation, or alignment

Publications on agent reliability (NeurIPS, ICML, ICLR)