Scholar

Henry Sleight

Google Scholar ID: FRHn0z4AAAAJ

Research Manager, Anthropic Fellows Program, Program Manager, Constellation

AI SafetyAdversarial RobustnessModel Organisms of Misalignment

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

395

H-index

i10-index

Publications

Co-authors

list available

Contact

No contact links provided.

Publications

16 items

AI Organizations are More Effective but Less Aligned than Individual Agents

2026

Cited

Abstractive Red-Teaming of Language Model Character

2026

Cited

The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?

2026

Cited

Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs

2025

Cited

Evaluating Control Protocols for Untrusted AI Agents

2025

Cited

Believe It or Not: How Deeply do LLMs Believe Implanted Facts?

2025

Cited

All Code, No Thought: Current Language Models Struggle to Reason in Ciphered Language

2025

Cited

Stress-Testing Model Specs Reveals Character Differences among Language Models

2025

Cited

Resume (English only)

Co-authors

3 total