Scholar

Lei Ke

Google Scholar ID: WseeNrUAAAAJ

Researcher, Tencent AI (Seattle)

Computer VisionMachine LearningMulti-modal LLMs

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

2,891

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailkeleiwhu@gmail.com CVOpen ↗TwitterOpen ↗GitHubOpen ↗

Publications

21 items

AudioX-Turbo: A Unified Framework for Efficient Anything-to-Audio Generation

2026

Cited

Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors

2026

Cited

AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization

2026

Cited

Unlocking Dense Metric Depth Estimation in VLMs

2026

Cited

Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

2026

Cited

JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments

2026

Cited

Beyond VLM-Based Rewards: Diffusion-Native Latent Reward Modeling

2026

Cited

Stable and Efficient Single-Rollout RL for Multimodal Reasoning

2025

Cited

Resume (English only)

Academic Achievements

Published multiple papers including 'Segment Anything in High Quality' (NeurIPS 2023), 'Gaussian Grouping: Segment and Edit Anything in 3D Scenes' (ECCV 2024), and 'Matching Anything By Segmenting Anything' (CVPR 2024); involved in projects such as Gaussian Grouping, HQ-SAM, and SAM-PT; presented research at international conferences like ICCV, CVPR, and ECCV; served as an Area Chair for ICLR 2026 and NeurIPS 2025; organized a workshop at ICCV 2025.

Research Experience

Currently a Senior Researcher at Tencent AI Seattle; was a Postdoctoral Research Associate at Carnegie Mellon University, working with Prof. Katerina Fragkiadaki; served as a visiting PhD student at the Computer Vision Lab, ETH Zurich, supervised by Prof. Fisher Yu and Dr. Martin Danelljan.

Education

Obtained his Ph.D. degree from the CSE Department at HKUST in mid-2023, supervised by Chi-Keung Tang and Yu-Wing Tai. During his PhD, he also spent two years as a visiting scholar at ETH Zurich. He received his B.E. degree from the School of Computer Science at Wuhan University.

Background

Senior Research Scientist at Tencent AI, Seattle Lab. His primary research interest lies in building multimodal foundation systems, especially visual understanding, reasoning, and generation. Previously, he worked as a Postdoctoral Research Associate at Carnegie Mellon University's Computer Science and in the Computer Vision Lab of ETH Zurich.

Miscellany

His open-source projects have over 10K+ GitHub stars; gave a guest lecture on Vision Foundation Model at Texas A&M University; delivered talks on Scene Understanding with Vision Foundation Models at Stanford SVL and MARVL.

Co-authors

17 total