Scholar

Bolin Lai

Google Scholar ID: lWrljmQAAAAJ

Georgia Institute of Technology

Multimodal LearningLLMImage GenerationVideo Generation

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

215

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailbolin.lai@gatech.edu TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

10 items

GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

2026

Cited

Omni-MMSI: Toward Identity-attributed Social Interaction Understanding

2026

Cited

ARGaze: Autoregressive Transformers for Online Egocentric Gaze Estimation

2026

Cited

Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective

2025

Cited

Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training

2025

Cited

SocialGesture: Delving into Multi-person Gesture Understanding

2025

Cited

Learning Predictive Visuomotor Coordination

2025

Cited

Towards Online Multi-Modal Social Interaction Understanding

2025

Cited

Resume (English only)

Academic Achievements

Published papers: Accepted in NeurIPS2025, CVPR2025, ECCV2024, IJCV, etc. Awards: Outstanding Reviewer at CVPR2025, Distinguished Paper at EgoVis CVPR2025, Best Student Paper Prize at BMVC, etc.

Research Experience

Research Scientist Intern at Meta Superintelligence Labs, Multimedia Core Video Generation Team (May 2025 – Present), working on analyzing and improving the diffusibility of high-dimensional latent space and engineering experience on large-scale MovieGen codebase and distributed training.

Education

PhD: Machine Learning Program at Georgia Institute of Technology, advised by Prof. James Rehg and co-advised by Prof. Zsolt Kira. Master's: ECE from Shanghai Jiao Tong University, advised by Prof. Ya Zhang. Bachelor's: Information Engineering from Shanghai Jiao Tong University.

Background

Research interests: Multimodal Learning, including Multimodal Understanding (e.g., VLMs, MLLMs) and Image/Video Generation (e.g., diffusion, flow matching). Career goal is to build omni multimodal systems that can understand, reason, and generate across text, image, video, and audio.

Miscellany

Looking for a full-time Research Scientist / Applied Scientist / ML Engineer position (available starting Dec. 2025). Open to collaborations with motivated graduate/undergraduate students.

Co-authors

15 total