Scholar

Rowan Wang

Google Scholar ID: Y4bU0bwAAAAJ

Unknown affiliation

Mechanistic InterpretabilityLanguage Models

Google Scholar↗

Citations & Impact

All-time

Citations

1,399

H-index

4

i10-index

4

Publications

5

Co-authors

0

Contact

No contact links provided.

Publications

5 items

Introspection Adapters: Training LLMs to Report Their Learned Behaviors

2026

Cited

0

AuditBench: Evaluating Alignment Auditing Techniques on Models with Hidden Behaviors

2026

Cited

0

Believe It or Not: How Deeply do LLMs Believe Implanted Facts?

2025

Cited

0

Eliciting Secret Knowledge from Language Models

2025

Cited

0

Tamper-Resistant Safeguards for Open-Weight LLMs

arXiv.org · 2024

Cited

20

Resume (English only)

Co-authors

0 total

Co-authors: 0 (list not available)