Scholar

Lanyun Zhu

Google Scholar ID: urOSnlQAAAAJ

NTU, CityUHK, SUTD, BUAA

Multimodal LearningComputer VisionResource-efficient LearningLarge Vision-Language Model

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,584

H-index

i10-index

Publications

Co-authors

list available

Contact

Emaillanyun.zhu@ntu.edu.sg CVOpen ↗GitHubOpen ↗

Publications

24 items

Claw AI Lab: An Autonomous Multi-Agent Research Team

2026

Cited

Video-Zero: Self-Evolution Video Understanding

2026

Cited

4DVGGT-D: 4D Visual Geometry Transformer with Improved Dynamic Depth Estimation

2026

Cited

Aligning LLM Uncertainty with Human Disagreement in Subjectivity Analysis

2026

Cited

ARGUS: Policy-Adaptive Ad Governance via Evolving Reinforcement with Adversarial Umpiring

2026

Cited

StreamCacheVGGT: Streaming Visual Geometry Transformers with Robust Scoring and Hybrid Cache Compression

2026

Cited

Robust 4D Visual Geometry Transformer with Uncertainty-Aware Priors

2026

Cited

HD-VGGT: High-Resolution Visual Geometry Transformer

2026

Cited

Resume (English only)

Academic Achievements

Published multiple papers, including:
- AAAI 2026: Multi-Agent VLMs Guided Self-Training with PNU Loss
- NeurIPS 2025: Retrv-R1: A Reasoning-Driven MLLM Framework
- ICML 2025: CPCF: A Cross-Prompt Contrastive Framework
- CVPR 2025: POPEN: Preference-Based Optimization and Ensemble
- NeurIPS 2024: Hybrid Mamba for Few-Shot Segmentation
- ICML 2024: Discrete Latent Perspective Learning
- CVPR 2024: LLaFS: When Large Language Models Meet Few-Shot Segmentation
- CVPR 2024: Addressing Background Context Bias in Few-Shot Segmentation through Iterative Modulation

Research Experience

Currently a Research Fellow at the Rapid-Rich Object Search (ROSE) Lab, Nanyang Technological University, working with Professor Bihan Wen. Previously a postdoctoral fellow at City University of Hong Kong, working with Professor Shiqi Wang. Also worked at Megvii and SenseTime. Currently collaborating closely with NVIDIA, Alibaba (Professor Jieping Ye), and Tencent.

Education

Received a bachelor's degree from Beihang University in June 2020; obtained a Ph.D. from the Singapore University of Technology and Design (SUTD) in 2025, supervised by Professor Jun Liu.

Background

Research directions are multimodal learning and computer vision, currently focusing on multimodal large language models (MLLMs) and image segmentation. The research goal is to build efficient, trustworthy, and fine-grained multimodal systems that can process or integrate information from diverse modalities—such as text, images, videos, and data from other sensors—to effectively address a wide range of real-world industrial and scientific challenges.

Miscellany