Scholar

Seungone Kim

Google Scholar ID: ZNg3w8wAAAAJ

Carnegie Mellon University

Large Language ModelsNatural Language Processing

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,629

H-index

i10-index

Publications

Co-authors

list available

Contact

CVOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

24 items

Verus-SpecGym: An Agentic Environment for Evaluating Specification Autoformalization

2026

Cited

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

2026

Cited

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

2026

Cited

Reasoning over mathematical objects: on-policy reward modeling and test time aggregation

2026

Cited

RefineBench: Evaluating Refinement Capability of Language Models via Checklists

2025

Cited

SPICE: Self-Play In Corpus Environments Improves Reasoning

2025

Cited

VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding

2025

Cited

OptimalThinkingBench: Evaluating Over and Underthinking in LLMs

2025

Cited

Resume (English only)

Academic Achievements

Papers 'Web-Shepherd' and 'Verbal Confidence' accepted to NeurIPS 2025 (July 2025).
'M-Prometheus' accepted to COLM 2025 (July 2025).
'AgoraBench' and 'LLM-as-an-Interviewer' accepted to ACL 2025 and ACL 2025 Findings (May 2025).
'BiGGen Bench' received Best Paper Award at NAACL 2025 (April 2025); both 'BiGGen Bench' and 'KMMLU' accepted to NAACL 2025 (January 2025).
'Evaluation-time Scaling' and 'M-Prometheus' papers released (March/April 2025).
'MBR Decoding / Distillation', 'Data Provenance Gap', and 'Pangea' accepted to ICLR 2025 (January 2025).
'System Message Generalization' and 'Consent in Crisis' accepted to NeurIPS 2024 (September 2024).
'Prometheus 2' and 'Think-and-Execute' accepted to EMNLP 2024; 'Self-Explore' to EMNLP 2024 Findings (September 2024).
'LangBridge' and 'Multi-Task Inference' accepted to ACL 2024; 'Prometheus-Vision' to ACL 2024 Findings (May 2024).
'Prometheus' and 'Flask' accepted to ICLR 2024 (January 2024).
'CoT Collection' accepted to EMNLP 2023 (October 2023).
'ExpertLM' accepted to ICML 2023 (April 2023).
'CoTEVer' accepted to EACL 2023 (Demo Track) (February 2023).
'SICK' accepted to COLING 2022 (October 2022).
Reached 1,000 Google Scholar citations in April 2024.
Awarded the NEC Student Research Fellowship in October 2024 to support research on leveraging synthetic data for improving LLMs.

Background

Ph.D. student at Carnegie Mellon University's Language Technologies Institute (LTI), focusing on evaluation frameworks and post-training/inference algorithms for large language models (LMs).
Primary research interests include developing better evaluation benchmarks (e.g., LLM-as-a-Judge, Reward Models, Meta-evaluation for Evaluators) and algorithms that leverage evaluation feedback (e.g., Self-Improvement, Weak-to-Strong Generalization).
Also interested in multimodality and multilinguality.
Inspired by Lord Kelvin’s saying, “If you cannot measure it, you cannot improve it,” and the Korean proverb, “A tree with deep roots is not shaken by the wind,” emphasizing the importance of understanding why models succeed or fail to advance frontier models.
Hosts weekly office hours for research discussions and Ph.D. application advice, and is open to mentoring enthusiastic students (CMU-affiliated or not), especially those interested in evaluating hard-to-define but important model properties.

Co-authors

30 total