Scholar

Nathaniel Li

Google Scholar ID: 2XmBzbcAAAAJ

Meta AI

Machine LearningBenchmarksML Safety

Citations & Impact

All-time

Citations

1,967

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

6 items

2026

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

arXiv.org · 2023

Cited

298

Resume (English only)

Academic Achievements

Humanity's Last Exam (HLE): Developed an extremely challenging LLM benchmark with 3,000 expert-designed questions across mathematics, philosophy, and sciences, highlighting a significant gap between AI and human experts (arXiv Preprint).
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet: Demonstrated that current LLM defenses fail against multi-turn human adversarial attacks; accepted as an Oral presentation at NeurIPS 2024 Red Teaming Workshop.
The WMDP Benchmark: Introduced a benchmark measuring hazardous knowledge in biosecurity, cybersecurity, and chemical security, and proposed RMU—an unlearning method that reduces harmful capabilities while preserving general performance (ICML 2024).
Virology Capabilities Test (VCT): Co-developed a multimodal virology Q&A benchmark assessing LLMs’ ability to troubleshoot lab protocols (arXiv Preprint).
HarmBench: Contributed to a standardized evaluation framework for automated red teaming, benchmarking 18 methods and 33 LLMs/defenses (ICML 2024).
Representation Engineering: Participated in developing a top-down approach to AI transparency (arXiv Preprint).

Co-authors

8 total