About the job
We’re looking for Research Scientists who can drive effective RL or mid-training research in a small-team setting. You’ll own ambiguous, hard research problems end-to-end: forming hypotheses, designing experiments, building the training/eval/data needed to test them, and pushing results into the next model. You should expect significantly more scope and autonomy than in other research labs.
Responsibilities
- Improve our understanding of RL, what it takes to handle longer horizon tasks, and train with less compute
- Train graders to improve performance on coding tasks with non-verifiable reward
- Improve the quality and difficulty of datapoints we use for training our models
- Realtime RL https://cursor.com/blog/tab-rl for coding agents
Qualifications
Minimum
No minimum qualifications listed.
Preferred
- You have a deep background in RL and strong machine learning fundamentals
- You’re an excellent programmer and software engineer
- You can handle ambiguous research tasks with little guidance
- You care a lot about data quality, and can dive into the data when appropriate
- You are truth seeking, aiming to learn more about the science than proving your ideas are correct.