About the job
We’re hiring a Data Scientist to help build, evaluate, and continuously improve mitigations that prevent extreme harms from AI systems. This role is for an experienced, highly autonomous individual contributor who can take ambiguous problem statements, structure rigorous analyses, and translate findings into actionable product and policy changes.
Responsibilities
Evaluate and improve mitigation systems, including classifiers and detection pipelines across domains (e.g., biosecurity, cybersecurity, and emerging risk areas).
Diagnose false positives and false negatives with deep error analysis, root cause investigation, and clear recommendations for mitigation adjustments.
Build monitoring and measurement frameworks to track mitigation effectiveness over time and across user segments and use cases.
Identify trends in over-blocking vs. under-blocking, quantify customer impact, and propose prioritized interventions.
Develop insights from customer feedback, complaints, and usage patterns to detect shifts in adversarial behavior and system failure modes.
Expand risk monitoring into new areas, including cybersecurity threats and model loss-of-control or sabotage scenarios, in partnership with domain experts.
Communicate results to technical and executive stakeholders with crisp narratives, decision-ready metrics, and clear tradeoffs.
Qualifications
Minimum
Significant experience in data science or applied analytics in high-stakes domains (e.g., security, trust & safety, abuse prevention, fraud, platform integrity, or reliability).
Strong foundations in experimentation, causal thinking, and/or observational inference; ability to design robust measurement under imperfect data.
Fluency in SQL and Python (or equivalent) for analysis, modeling, and building monitoring workflows.
Experience building metrics, dashboards, and operational monitoring that meaningfully changes outcomes (not just reporting).
Track record of driving cross-functional impact with engineering, product, and research partners.
Cybersecurity data science experience (strong preference), including exposure to threat modeling, adversarial dynamics, abuse patterns, or security telemetry.
Experience with classifier evaluation, calibration, thresholding, and error analysis at scale.
Familiarity with detection systems in adversarial settings (e.g., evasion, distribution shift, feedback loops).
Preferred
Trust & Safety experience is helpful, but not required.
Genuine interest in AI safety, alignment, and catastrophic risk prevention.