Model Behavior Engineer

About the job

You'll own the quality bar for Notion AI products. You’ll work with product and engineering teams to build systems to define what “good” looks like, measure our progress, and drive changes to deliver reliable and high-quality AI experiences. Your work directly shapes how Notion's AI products behave for millions of users.

Responsibilities

- Context engineering — Design, test, and iterate on system prompts, tool prompts, and context strategies that shape how Notion's AI products behave.

- Understand & debug — Live in production data: transcripts, logs, user feedback. Reproduce issues, identify root causes, and translate symptoms into actionable problem statements.

- Build evals & Measurement — Design eval strategies, build datasets, run evaluations. Track quality over time. Identify issues before users do.

- Evaluate and launch new models with leading research labs — Evaluate and launch models from OpenAI, Anthropic, Google, and others. Benchmark across dimensions: quality, latency, cost, edge cases.

- Drive quality priorities — Work embedded with eng and product teams to surface the most important issues. Own the quality narrative: severity, frequency, what to fix and why.

- Build tooling & systems — Help manage AI observability and eval platforms (e.g., Braintrust). Build the playbooks and tools that enable all teams at Notion to build AI products.

Qualifications

Minimum

- Driver mentality — You treat problems as yours. If something's broken, it's your job to fix it, even if you didn't cause it. You have a bias to action.

- Curiosity －You’re excited about exploring the “jagged frontier” of LLM capabilities and how AI products meet reality

- Analytical instinct — Your first move is to look at data. You can find signal in noise.

- Comfortable working with data — You can self-serve insights from large datasets, whether through SQL, coding agents, or other tools.

- Clear communication — You can explain complex issues simply.

- Experience with LLMs, prompting, or AI products

Preferred

- Backgrounds in engineering, product, data science, research, consulting

- You've built something on your own to solve a problem — side project, startup, tool, whatever