Scholar

Satyapriya Krishna

Google Scholar ID: Q5bfPlkAAAAJ

Harvard University

Trustworthy AILarge Language ModelsExplainable & Fair ML

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

2,494

H-index

i10-index

Publications

Co-authors

Contact

Emailspkrishnaofficial@gmail.com CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

13 items

ARES: Adaptive Red-Teaming and End-to-End Repair of Policy-Reward System

2026

Cited

Evaluating Nova 2.0 Lite model under Amazon's Frontier Model Safety Framework

2026

Cited

From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs

2025

Cited

Self-Correcting Large Language Models: Generation vs. Multiple Choice

2025

Cited

The Alignment Auditor: A Bayesian Framework for Verifying and Refining LLM Objectives

2025

Cited

Learning from Failures: Understanding LLM Alignment through Failure-Aware Inverse RL

2025

Cited

D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models

2025

Cited

Evaluating the Critical Risks of Amazon's Nova Premier under the Frontier Model Safety Framework

2025

Cited

Resume (English only)

Background

AI Researcher with a focus on the trustworthy aspects of generative models, including explainability, fairness, privacy, and robustness.

Miscellany

No information provided about personal interests or hobbies

Co-authors

0 total

Co-authors: 0 (list not available)