About the job
You will work on the most critical post-training and reinforcement learning challenges at any given time — including reward modeling, preference optimization (RLHF/DPO), and RL for improving reasoning, truthfulness, and real-world capabilities. You will get clarity on your first project before an offer.
Responsibilities
No responsibilities listed.
Qualifications
Minimum
You believe truth-seeking AI is the most important and challenging problem. You are obsessed about building incredibly useful models through post-training and RL techniques. You are a power user of AI models and eager to push the boundaries of what’s possible with reinforcement learning and alignment methods. If you previously worked on post-training, RLHF, or trained models used by millions of people it’s a big plus, but relevant experience is not required. You take pride in your work and thrive in meritocratic environments.
Preferred
No preferred qualifications listed.