Staff AI Scientist · Fiddler AI

About the job

Our Purpose At Fiddler, we understand the implications of AI and the impact that it has on human lives. Our company was born with the mission of building trust into AI. The rise of Generative AI and Agents has unlocked generalized intelligence but also widened the risk aperture and made it harder to ensure that AI applications are working well. Fiddler enables organizations to get ahead of these issues by helping deploy trustworthy, and transparent AI solutions. Fiddler partners with AI-first organizations to help build a long-term framework for responsible AI practices, which, in turn, builds trust with their user base. AI Engineers, Data Science, and business teams use Fiddler AI to monitor, evaluate, secure, analyze, and improve their AI solutions to drive better outcomes. Our platform enables engineering teams and business stakeholders alike to understand the 'what', 'why', and 'how' behind AI outcomes.

Responsibilities

- Lead applied research and development. Lead applied research and development for the models and datasets at the core of Fiddler's Trust Service and suite of guardrail classifiers and evaluators that customers depend on to keep their LLM and agentic applications safe, accurate, and compliant in production.

- Partner closely with other engineering teams, Product, and Customer Success. You’ll build strong relationships with customer data science and ML engineering teams, supporting their AI observability journey and ensuring they realize measurable value from Fiddler.

- Design, train, and ship production classifiers for safety, security, and quality detection (e.g., prompt injection, jailbreaks, PII, hallucination, faithfulness) under strict latency and cost constraints.

- Lead the development of synthetic and adversarial dataset pipelines, including novel methods for generating, filtering, and validating data that exposes failure modes our models need to learn.

- Drive the technical direction of generative insights – the LLM- and agent-powered analysis layer that helps customers diagnose what's going wrong in their AI applications and why.

- Contribute to the evaluation and experimentation infrastructure that lets the AI Science team and our customers reliably measure model quality, regression, and drift across rapidly evolving model populations.

- Explore reinforcement learning and preference-based methods where they offer real leverage over supervised baselines.

- Collaborate with Backend and Platform engineers to take research prototypes from notebook to a hardened, scaled, observable service.

- Partner with Product, Solutions Engineering, and Customer Success to translate enterprise customer needs into research problems and translate research results back into product.

- Mentor AI Scientists on the team, raise the technical bar through code review and design review, and represent Fiddler externally through publications, talks, or open-source contributions when appropriate.

Qualifications

Minimum

7+ years of applied AI experience, with a strong track record of taking models from research to production

Experience in LLM or Agentic Evals, Guardrailing

Deep expertise training and fine-tuning classifier models, including modern encoder architectures (BERT-family, ModernBERT, etc.) and LLM-as-classifier approaches; clear understanding of the tradeoffs between them

Hands-on experience with dataset development as a first-class engineering discipline: sourcing, labeling, synthetic generation, adversarial augmentation, and quality control

Strong applied experience with LLMs and agentic systems – prompting, fine-tuning, and evaluation

Proficiency in Python and the modern ML stack (PyTorch, Hugging Face, common training/serving frameworks)

Comfortable working in production environments and partnering with backend and platform engineers on real-time inference, monitoring, and rollout

Excited by the prospect of using AI coding tools in your own workflow to push the limits of what one engineer can ship – responsibly, with a clear eye on quality, security, and the failure modes these tools introduce

Excellent written and verbal communication; able to explain research tradeoffs to engineers, PMs, and customers

Ability to work in our Palo Alto office 2-3 days a week

Preferred

M.S. or Ph.D. in Computer Science, Machine Learning, Statistics, Physics, or a related quantitative field

Published research at top ML or NLP venues (NeurIPS, ICML, ICLR, ACL, EMNLP, etc.)

Experience with reinforcement learning, RLHF, RLAIF, or preference-based fine-tuning

Experience with synthetic data generation pipelines at scale

Background in AI safety, red-teaming, or adversarial ML

Experience working with enterprise customers in regulated industries (finance, healthcare, government)