1. MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs
2. FT2TF: First-Person Statement Text-To-Talking Face Generation
3. Multi-layer Learnable Attention Mask for Multimodal Tasks
4. Localizing Moments in Long Video Via Multimodal Guidance
Research Experience
Winter 2025, Research Intern at Samsung Research America, working on Reinforcement Learning and Multimodal Large Language Models; Supporting US and Latin American clients on data and business analytics (KPI decisions, market forecasting) and automation of AI/ML pipelines; Developed systems that process large-scale, real-time data for many concurrent users, leveraging distributed training and inference.
Education
PhD Candidate, Computer Science, Dartmouth College, Advisor: SouYoung Jin
Background
Research Interests: Multimodal Large Language Models (MLLMs), Computer Vision, Evaluation and Step-Verified Reasoning, Improved Multimodal Fusion through Learnable Masks, Cross-Modal Alignment, and Guidance Mechanisms for Long-Video Understanding. Professional Field: Video Understanding, Building End-to-End AI Datasets.
Miscellany
Python open-source contributor, contributed to projects advancing the computer vision and geospatial communities.