Scholar

Jeonghye Kim

Google Scholar ID: koDFScAAAAAJ

PhD candidate, KAIST

Offline Reinforcement LearningLLM post-training

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

H-index

i10-index

Publications

Co-authors

list available

Contact

LinkedInOpen ↗

Publications

11 items

Are We Measuring Strategy or Phrasing? The Gap Between Surface- and Approach-Level Diversity in LLM Math Reasoning

2026

Cited

Be My Tutor: On-Policy Co-Distillation for Mutual LLM Improvement via Peer Feedback

2026

Cited

Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

2026

Cited

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

2026

Cited

Understanding Reasoning in LLMs through Strategic Information Allocation under Uncertainty

2026

Cited

Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization

2026

Cited

Beyond Normalization: Rethinking the Partition Function as a Difficulty Scheduler for RLVR

2026

Cited

Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data

2025

Cited

Resume (English only)

Academic Achievements

- Publications:
- [U1] Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
- [C8] RL-Studio: A System for Multi-Phase Reinforcement Learning Experimentation
- [W2] RAISE: Enhancing Scientific Reasoning in LLMs via Step-by-Step Retrieval
- [C7] ReflAct: World-Grounded Decision Making in LLM Agents via Goal-State Reflection
- [W1] Align While Search: Belief-Guided Exploratory Inference for Test-Time World Alignment
- [C6] Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data
- [C5] Online Pre-Training for Offline-to-Online Reinforcement Learning
- [C4] ARS: Adaptive Reward Scaling for Multi-Task Reinforcement Learning
- [C3] Adaptive Q-Aid for Conditional Supervised Learning in Offline Reinforcement Learning
- [C2] Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making
- [C1] LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework
- Awards:
- ICML 2025 Spotlight (Top 2.6%)
- ICLR 2024 Spotlight (Top 5%)

Research Experience

- Research Intern, Microsoft Research, Shanghai, China (2025.09-2026.02), Mentor: Xufang Luo
- Research Intern, Machine Intelligence Lab, SNU, Seoul, South Korea (2025.03-2025.09)
- Research Intern, LG AI Research, Seoul, South Korea (2024.03-2024.12), Mentor: Kanghoon Lee
- CEO, Dearplants, Daejeon, South Korea (2020.06-2021.11)

Education

- Ph.D. Candidate in Electrical Engineering, KAIST, Advisor: Prof. Youngchul Sung (2024.03~)
- M.S. in Electrical Engineering, KAIST, Advisor: Prof. Youngchul Sung (2022.03-2024.02)
- B.S. in School of Computing, KAIST (Cum laude) (2015.03-2022.02)
- B.A. in Psychology, Bachelor's Degree Examination for Self-Education (BDES) (2018)

Background

Research Interests: How reinforcement learning can enhance the reasoning and decision-making of intelligent agents, especially by improving pretrained policies such as large language models and vision-language models.

Miscellany

Invited Talks:
- 2025.02 "Adaptive Q-Aid for Conditional Supervised Learning in Offline Reinforcement Learning" @ ML2
- 2023.10 "LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework" @ CARAI

Co-authors

18 total