Scholar
Jitesh Jain
Google Scholar ID: nygnfNwAAAAJ
Georgia Tech
Image Segmentation
Multimodal Reasoning
Computer Vision
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
1,022
H-index
7
i10-index
7
Publications
12
Co-authors
10
list available
Contact
No contact links provided.
Publications
6 items
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding
2026
Cited
6
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
2025
Cited
0
AUGUSTUS: An LLM-Driven Multimodal Agent System with Contextualized User Memory
2025
Cited
0
Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
2025
Cited
0
Slow-Fast Architecture for Video Multi-Modal Large Language Models
2025
Cited
0
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation
arXiv.org · 2024
Cited
2
Resume (English only)
Co-authors
10 total
Humphrey Shi
Georgia Tech | UIUC || ...
Co-author 2
Jianwei Yang
Research Scientist, Meta SuperIntelligence Lab
Zilong Huang
ByteDance Inc.
Ning Yu
Netflix Eyeline Studios
Yuqian Zhou
Senior Research Scientist at Adobe Research
Co-author 7
Zhengyuan Yang
Principal Researcher, Microsoft
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up