Scholar

Boyi Wei

Google Scholar ID: sRDckqEAAAAJ

PhD student, Princeton University

AI SafetyAlignment

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

451

H-index

6

i10-index

6

Publications

8

Co-authors

41

list available

Contact

No contact links provided.

Publications

8 items

Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism

2026

Cited

0

Best Practices for Biorisk Evaluations on Open-Weight Bio-Foundation Models

2025

Cited

0

Scaling Latent Reasoning via Looped Language Models

2025

Cited

0

Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation

2025

Cited

0

Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents

2025

Cited

0

Dynamic Risk Assessments for Offensive Cybersecurity Agents

2025

Cited

0

An Adversarial Perspective on Machine Unlearning for AI Safety

2024

Cited

14

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors

arXiv.org · 2024

Cited

36

Resume (English only)

Co-authors

41 total

Google DeepMind

Peter Henderson

Princeton University

Professor, Princeton University

Princeton University

Department of Computer Science, Princeton University

Udari Madhushani Sehwag

Research Scientist, Scale AI