Member of Technical Staff - Post-Training and RL

About the job

You will work on the most critical post-training and reinforcement learning challenges at any given time — including reward modeling, preference optimization (RLHF/DPO), and RL for improving reasoning, truthfulness, and real-world capabilities. You will get clarity on your first project before an offer.

Responsibilities

Work on the most critical post-training and reinforcement learning challenges at any given time

Focus on reward modeling, preference optimization (RLHF/DPO), and RL for improving reasoning, truthfulness, and real-world capabilities

Qualifications

Minimum

Believe truth-seeking AI is the most important and challenging problem

Obsessed about building incredibly useful models through post-training and RL techniques

Power user of AI models and eager to push the boundaries of what’s possible with reinforcement learning and alignment methods

Take pride in your work and thrive in meritocratic environments

Preferred

Previously worked on post-training, RLHF, or trained models used by millions of people