Daniel Tan
Scholar

Daniel Tan

Google Scholar ID: QKO1QacAAAAJ
UCL
AlignmentMLRobotics
Citations & Impact
All-time
Citations
165
 
H-index
6
 
i10-index
6
 
Publications
12
 
Co-authors
37
list available
Resume (English only)
Academic Achievements
  • Has made substantial contributions to several papers, including 'Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time', 'Easily steer OOD generalisation by adding one line to training data', 'Emergent Misalignment: Narrow Finetuning can lead to Broad Misalignment', 'Models finetuned to write insecure code learn to admire Nazis', 'Analyzing the Generalization and Reliability of Steering Vectors' (accepted at NeurIPS 2024), 'Towards Generalist Robot Learning from Internet Video: A Survey' (in proceedings, JAIR).
Research Experience
  • Posts frequent updates on LessWrong and Twitter.
Education
  • Currently completing MATS 7.0 with Owain Evans. A PhD student at University College London, supervised by Paige Brooks. Supported by the Agency for Science, Technology and Research (A*STAR).
Background
  • Has a broad interest in AI alignment and AGI risk. Current focus is on understanding and evaluating the legibility of models' chain-of-thought reasoning. Also interested in steganography, prosaic interpretability, and alignment failure modes.
Miscellany
  • Personal interests include sharing updates on LessWrong and Twitter.