Tomek Korbak
Scholar

Tomek Korbak

Google Scholar ID: YQ5rrk4AAAAJ
UK AI Security Institute
language modelsAI safetyreinforcement learningchain of thought monitoringLLM agents
Citations & Impact
All-time
Citations
4,090
 
H-index
22
 
i10-index
29
 
Publications
20
 
Co-authors
8
list available
Resume (English only)
Academic Achievements
  • Published multiple papers at top-tier conferences including ICLR, ICML, NeurIPS, and COLM, such as:
  • “A sketch of an AI control safety case”
  • “Looking Inward: Language Models Can Learn About Themselves by Introspection” (ICLR 2025)
  • “Is model collapse inevitable? breaking the curse of recursion by accumulating real and synthetic data” (COLM 2024)
  • “Pretraining Language Models with Human Preferences” (ICML 2023, oral)
  • “On reinforcement learning and distribution matching for fine-tuning language models with no catastrophic forgetting” (NeurIPS 2022, oral)
  • Many papers accompanied by open-source code.