Scholar

Hannah Rose Kirk

Google Scholar ID: Fha8ldEAAAAJ

University of Oxford

Large language modelsNLPEthics in AIAlignmentAI Safety

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

2,778

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailhannah.kirk@oii.ox.ac.uk CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

12 items

PRISM-X: Experiments on Personalised Fine-Tuning with Human and Simulated Users

2026

Cited

Measuring and Mitigating Persona Distortions from AI Writing Assistance

2026

Cited

Reward Models Inherit Value Biases from Pretraining

2026

Cited

Neural steering vectors reveal dose and exposure-dependent impacts of human-AI relationships

2025

Cited

People readily follow personal advice from AI but it does not improve their well-being

2025

Cited

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

2025

Cited

Conversational AI increases political knowledge as effectively as self-directed internet search

2025

Cited

Lessons from a Chimp: AI "Scheming" and the Quest for Ape Language

2025

Cited

Resume (English only)

Academic Achievements

Oct 2024: 'LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages' accepted as an oral presentation at NeurIPS 2024 (top 0.5% of submissions).
Oct 2024: Contributed to 'The PRISM Alignment Project', exploring what participatory, representative, and individualized human feedback reveals about subjective and multicultural alignment of LLMs.
2023–2024: Awarded Microsoft’s Accelerating Foundation Models Research Programme grant for 'Personalised and diverse feedback for humans-and-models-in-the-loop'.
2022–2024: Awarded Meta AI Dynabench Grant for 'Optimizing feedback between humans-and-model-in-the-loop'.
2020–2024: ESRC PhD Scholarship (Digital Social Science Pathway).

Research Experience

Sep 2024–Present: Research Scientist (Societal Impacts), UK AI Safety Institute, His Majesty's Government. Investigating social and psychological capabilities of frontier AI.
Sep–Dec 2023: Visiting Academic in Data Science, New York University. Collaborated with Prof. He and Prof. Bowman on human-AI coordination and LLM alignment.
Feb–Oct 2023: External Student Researcher, Google. Co-hosted an adversarial challenge to identify unsafe failure modes in text-to-image models.
Aug–Dec 2023: Red-Teamer + Consultant, OpenAI. Improved safety of DALL-E and GPT-4 models.
Sep 2021–Sep 2023: Data Scientist in Online Safety, The Alan Turing Institute. Monitored and detected harmful language.
Sep 2021–Jul 2023: Research Scientist, Rewire Online. Implemented NLP solutions for online safety.
Oct 2020–Oct 2023: Research Labs Manager, Oxford Artificial Intelligence Society. Led student research projects on AI bias.
Sep 2019–Sep 2020: Research Scholar, The Berggruen Institute, China Center. Explored links between Chinese philosophy, AI, and privacy.

Background

Currently pursuing a PhD at the University of Oxford and working as a Research Scientist at the UK AI Safety Institute.
Research focuses on human-and-model-in-the-loop feedback and data-centric AI alignment.
Passionate about the societal impacts of AI systems as they scale across capabilities, domains, and populations.
Published work spans computational linguistics, economics, ethics, and sociology, addressing alignment, bias, fairness, and hate speech from a multidisciplinary perspective.
Frequently collaborates with industry and policymakers.

Co-authors

25 total