Nathaniel Li
Scholar

Nathaniel Li

Google Scholar ID: 2XmBzbcAAAAJ
Meta AI
Machine LearningBenchmarksML Safety
Citations & Impact
All-time
Citations
1,967
 
H-index
7
 
i10-index
6
 
Publications
7
 
Co-authors
8
list available
Resume (English only)
Academic Achievements
  • Humanity's Last Exam (HLE): Developed an extremely challenging LLM benchmark with 3,000 expert-designed questions across mathematics, philosophy, and sciences, highlighting a significant gap between AI and human experts (arXiv Preprint).
  • LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet: Demonstrated that current LLM defenses fail against multi-turn human adversarial attacks; accepted as an Oral presentation at NeurIPS 2024 Red Teaming Workshop.
  • The WMDP Benchmark: Introduced a benchmark measuring hazardous knowledge in biosecurity, cybersecurity, and chemical security, and proposed RMU—an unlearning method that reduces harmful capabilities while preserving general performance (ICML 2024).
  • Virology Capabilities Test (VCT): Co-developed a multimodal virology Q&A benchmark assessing LLMs’ ability to troubleshoot lab protocols (arXiv Preprint).
  • HarmBench: Contributed to a standardized evaluation framework for automated red teaming, benchmarking 18 methods and 33 LLMs/defenses (ICML 2024).
  • Representation Engineering: Participated in developing a top-down approach to AI transparency (arXiv Preprint).