Papers 'Web-Shepherd' and 'Verbal Confidence' accepted to NeurIPS 2025 (July 2025).
'M-Prometheus' accepted to COLM 2025 (July 2025).
'AgoraBench' and 'LLM-as-an-Interviewer' accepted to ACL 2025 and ACL 2025 Findings (May 2025).
'BiGGen Bench' received Best Paper Award at NAACL 2025 (April 2025); both 'BiGGen Bench' and 'KMMLU' accepted to NAACL 2025 (January 2025).
'Evaluation-time Scaling' and 'M-Prometheus' papers released (March/April 2025).
'MBR Decoding / Distillation', 'Data Provenance Gap', and 'Pangea' accepted to ICLR 2025 (January 2025).
'System Message Generalization' and 'Consent in Crisis' accepted to NeurIPS 2024 (September 2024).
'Prometheus 2' and 'Think-and-Execute' accepted to EMNLP 2024; 'Self-Explore' to EMNLP 2024 Findings (September 2024).
'LangBridge' and 'Multi-Task Inference' accepted to ACL 2024; 'Prometheus-Vision' to ACL 2024 Findings (May 2024).
'Prometheus' and 'Flask' accepted to ICLR 2024 (January 2024).
'CoT Collection' accepted to EMNLP 2023 (October 2023).
'ExpertLM' accepted to ICML 2023 (April 2023).
'CoTEVer' accepted to EACL 2023 (Demo Track) (February 2023).
'SICK' accepted to COLING 2022 (October 2022).
Reached 1,000 Google Scholar citations in April 2024.
Awarded the NEC Student Research Fellowship in October 2024 to support research on leveraging synthetic data for improving LLMs.
Background
Ph.D. student at Carnegie Mellon University's Language Technologies Institute (LTI), focusing on evaluation frameworks and post-training/inference algorithms for large language models (LMs).
Primary research interests include developing better evaluation benchmarks (e.g., LLM-as-a-Judge, Reward Models, Meta-evaluation for Evaluators) and algorithms that leverage evaluation feedback (e.g., Self-Improvement, Weak-to-Strong Generalization).
Also interested in multimodality and multilinguality.
Inspired by Lord Kelvin’s saying, “If you cannot measure it, you cannot improve it,” and the Korean proverb, “A tree with deep roots is not shaken by the wind,” emphasizing the importance of understanding why models succeed or fail to advance frontier models.
Hosts weekly office hours for research discussions and Ph.D. application advice, and is open to mentoring enthusiastic students (CMU-affiliated or not), especially those interested in evaluating hard-to-define but important model properties.