Member of Technical Staff - Post-Training and RL

xAI
Palo Alto, CA / Palo Alto, CA, Palo Alto, California, United States2026-04-28

About the job

You will work on the most critical post-training and reinforcement learning challenges at any given time — including reward modeling, preference optimization (RLHF/DPO), and RL for improving reasoning, truthfulness, and real-world capabilities. You will get clarity on your first project before an offer.

Responsibilities

No responsibilities listed.

Qualifications

Minimum

You believe truth-seeking AI is the most important and challenging problem. You are obsessed about building incredibly useful models through post-training and RL techniques. You are a power user of AI models and eager to push the boundaries of what’s possible with reinforcement learning and alignment methods. If you previously worked on post-training, RLHF, or trained models used by millions of people it’s a big plus, but relevant experience is not required. You take pride in your work and thrive in meritocratic environments.

Preferred

No preferred qualifications listed.