Scholar

Geewook Kim

Google Scholar ID: 1a2QbgEAAAAJ

NAVER Cloud AI & KAIST AI

Large Language ModelsMultimodal LLMsDocument AI

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,560

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailgwkim.rsrch@gmail.com CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

5 items

Does Audio Matter for Modern Video-LLMs and Their Benchmarks?

2025

Cited

Context-Informed Grounding Supervision

2025

Cited

MambaMia: A State-Space-Model-Based Compression for Efficient Video Understanding in Large Multimodal Models

2025

Cited

MMRefine: Unveiling the Obstacles to Robust Refinement in Multimodal Large Language Models

2025

Cited

Evaluating Multimodal Generative AI with Korean Educational Standards

2025

Cited

Resume (English only)

Academic Achievements

NAACL 2025: 'Evaluating Multimodal Generative AI with Korean Educational Standards' – introduced KoNET, a Korean multimodal benchmark based on standardized tests
ICLR 2025: 'How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?' – proposed a weight merging method to preserve safety and capability
EMNLP 2024: 'On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding' – developed a more efficient VLM
ACL 2024 Findings: 'Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation' – created an open-source VLM for fine-grained evaluation
EMNLP 2023: 'Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models' – introduced the Cream model for document understanding

Research Experience

Applied Research Scientist and Tech Lead at NAVER Cloud, managing multiple R&D projects
Led open-source projects including Donut, Webvicob, and Cream
Developed LLM-based multimodal products such as HyperCLOVA X Vision
Conducting Ph.D. research at KAIST AI on multimodal machine learning
Part-time lecturer at the University of Seoul teaching AI engineering

Background

Currently serving as an Applied Research Scientist and Technical Leader (Tech Lead) at NAVER Cloud
Research interests broadly span Multimodal Machine Learning (Vision, Language, and Audio)
Conducts research and software engineering for NAVER's LLM-based multimodal solutions and products (e.g., HyperCLOVA X Vision)
Pursuing a Ph.D. in Artificial Intelligence at KAIST AI under the supervision of Prof. Minjoon Seo
Serves as a part-time lecturer at the University of Seoul, teaching 'AI Engineering in Production'
Focuses on building robust and generalizable machine learning systems for real-world applications

Co-authors

26 total