Scholar

Shoubin Yu

Google Scholar ID: UqC5AoMAAAAJ

PhD Candidate at UNC Chapel Hill

Multimodal AIMachine LearningComputer VisionVideo Understanding

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,040

H-index

i10-index

Publications

Co-authors

list available

Contact

TwitterOpen ↗GitHubOpen ↗

Publications

20 items

STORM: Internalized Modeling for Spatial-Temporal Reasoning in Video-Language Models

2026

Cited

EgoMemReason: A Memory-Driven Reasoning Benchmark for Long-Horizon Egocentric Video Understanding

2026

Cited

Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos

2026

Cited

VisionCoach: Reinforcing Grounded Video Reasoning via Visual-Perception Prompting

2026

Cited

Balancing Faithfulness and Performance in Reasoning via Multi-Listener Soft Execution

2026

Cited

When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning

2026

Cited

Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering

2025

Cited

SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models

2025

Cited

Resume (English only)

Academic Achievements

Publications: MEXA (EMNLP25), SeViLA (NeurIPS23), CREMA (ICLR25), etc.; Awards: SciVideoBench won the best benchmark paper award at ICCV 2025 KnowledgeMR workshop; Projects: Involved in various research on multimodal reasoning, visual editing/generative methods, and multimodal representation/feature engineering.

Research Experience

Internships: MIT-IBM Watson AI Lab (2021), Amazon (2023), Adobe Research (2024), Google DeepMind (2025).

Education

Ph.D.: University of North Carolina at Chapel Hill, Computer Science, Advisor: Prof. Mohit Bansal; B.S.: Shanghai Jiao Tong University.

Background

Research Interests: Multimodal AI, exploring how to enable AI models to perceive and understand the world in a way similar to or beyond humans. Professional Field: Computer Science. Brief Introduction: Currently a fourth-year Ph.D. student at the University of North Carolina at Chapel Hill, advised by Prof. Mohit Bansal.

Miscellany