About the job
As a Principal Applied Scientist focused on RL post-training, you will lead the design and deployment of learning systems that shape how our models behave in real products. You will own the technical direction and strategy for post-training and adaptation of large models to align behavior with user value, safety, and business objectives.
Responsibilities
Lead the technical direction and strategy for RL post-training of production models, partnering with other scientists, engineers, and product leaders to align models with customer and business needs.
Design and implement post-training pipelines that combine techniques such as supervised fine-tuning on curated demonstrations, preference modeling and pairwise ranking, and RL-based alignment approaches like RLHF, RLAIF, or DPO for multi-objective optimization.
Develop reward models and objective formulations that balance constraints such as helpfulness, safety, fairness, compliance, and customer satisfaction, and iterate on them using human and AI feedback at scale through online and batch adaptation loops with strong guardrails.
Translate conversational logs, behavioral signals, and structured attributes into training, reward, and evaluation signals for post-training and reinforcement learning, turning heterogeneous data into actionable supervision.
Partner with model and platform teams to improve the efficiency and robustness of training and evaluation, including off-policy evaluation, replay strategies, controlled rollouts, and metrics and evaluation frameworks such as win-rates versus baselines, safety and quality metrics
Qualifications
Minimum
You have a PhD or equivalent experience in Computer Science, Electrical Engineering, Statistics, or a related field, with emphasis in areas such as reinforcement learning, bandits, large language models, or applied machine learning.
You have strong, current expertise in post-training techniques (such as supervised fine-tuning, DPO, RLHF/RLAIF, preference modeling, and multi-objective optimization), in evaluation and monitoring of aligned models (including win-rate experiments, human and AI feedback loops, long-horizon evaluation, and safety or guardrail metrics), and in modern transformer-based models and tooling such as LLMs, multimodal models, vector search, and orchestration frameworks.
You have experience working with cross-functional partners (for example, engineering, product, design, operations, legal, and compliance) in domains where safety, trust, or regulation matter, such as marketplaces, finance, healthcare, or other high-stakes verticals.
You demonstrate technical leadership and mentorship, helping senior engineers and scientists grow, creating clarity amid ambiguity, and driving alignment across teams, and you communicate complex technical ideas clearly to both expert and non-expert audiences in writing and verbally.
Preferred
You are an applied scientist who is excited to use reinforcement learning and post-training methods to shape how AI systems behave in complex, high-judgment settings, and you are comfortable owning ambiguous problems end-to-end—from framing the objective and data strategy to shipping models into production and measuring their impact.
You demonstrate technical leadership and mentorship, helping senior engineers and scientists grow, creating clarity amid ambiguity, and driving alignment across teams, and you communicate complex technical ideas clearly to both expert and non-expert audiences in writing and verbally.